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1 

COMPOUNDS AND METHODS FOR IMMUNOTHERAPY 
AND DIAGNOSIS OF TUBERCULOSIS 

5 TECHNICAL FIELD 

The present invention relates generally to detecting, treating and 
preventing Mycobacterium tuberculosis infection. The invention is more particularly 
related to polypeptides comprising a Mycobacterium tuberculosis antigen, or a portion 
or other variant thereof, and the use of such polypeptides for diagnosing and vaccinating 
1 0 against Mycobacterium tuberculosis infection. 

BACKGROUND OF THE INVENTION 

Tuberculosis is a chronic, infectious disease, that is generally caused by 
infection with Mycobacterium tuberculosis. It is a major disease in developing 
1 5 countries, as well as an increasing problem in developed areas of the world, with about 
8 million new cases and 3 million deaths each year. Although the infection may be 
asymptomatic for a considerable period of time, the disease is most commonly 
manifested as an acute inflammation of the lungs, resulting in fever and a nonproductive 
cough. If left untreated, serious complications and death typically result. 

20 Although tuberculosis . can generally be controlled using extended 

antibiotic therapy, such treatment is not sufficient to prevent the spread of the disease. 
Infected individuals may be asymptomatic, but contagious, for some time. In addition, 
although compliance with the treatment regimen is critical, patient behavior is difficult 
to monitor. Some patients do not complete the course of treatment, which can lead to 

25 ineffective treatment and the development of drug resistance. 

Inhibiting the spread of tuberculosis requires effective vaccination and 
accurate, early diagnosis of the disease. Currently, vaccination with live bacteria is the 
most efficient method for inducing protective immunity. The most common 
Mycobacterium employed for this purpose is Bacillus Calmette-Guerin (BCG), an 

30 avirulent strain of Mycobacterium bovis. However, the safety and efficacy of BCG is a 
source of controversy and some countries, such as the United States, do not vaccinate 
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the general public. Diagnosis is commonly achieved using a skin test, which involves 
intradermal exposure to tuberculin PPD (protein-purified derivative). Antigen-specific 
T cell responses result in measurable induration at the injection site by 48-72 hours after 
injection, which indicates exposure to Mycobacterial antigens. Sensitivity and 
5 specificity have, however, been a problem with this test, and individuals vaccinated 
with BCG cannot be distinguished from infected individuals. 

While macrophages have been shown to act as the principal effectors of 
M tuberculosis immunity, T cells are the predominant inducers of such immunity. The 
essential role of T cells in protection against M. tuberculosis infection is illustrated by 

10 the frequent occurrence of M. tuberculosis in AIDS patients, due to the depletion of 
CD4 T cells associated with human immunodeficiency virus (HIV) infection. 
Mycobacterium-reactive CD4 T cells have been shown to be potent producers of 
gamma-interferon (IFN-y), which, in turn, has been shown to trigger the anti- 
mycobacterial effects of macrophages in mice. While the role of IFN-y in humans is 

15 less clear, studies have shown that 1,25-dihydroxy- vitamin D3, either alone or in 
combination with IFN-y or tumor necrosis factor-alpha, activates human macrophages 
to inhibit M. tuberculosis infection. Furthermore, it is known that IFN-y stimulates 
human macrophages to make 1, 25 -dihydroxy- vitamin D3. Similarly, IL-12 has been 
shown to play a role in stimulating resistance to M. tuberculosis infection. For a review 

20 of the immunology of M. tuberculosis infection see Chan and Kaufmann in 
Tuberculosis: Pathogenesis, Protection and Control, Bloom (ed.), ASM Press, 
Washington, DC, 1994. 

Accordingly, there is a need in the art for improved vaccines and 
methods for preventing, treating and detecting tuberculosis. The present invention 

25 fulfills these needs and further provides other related advantages. 

SUMMARY OF THE INVENTION 

Briefly stated, this invention provides compounds and methods for 
preventing and diagnosing tuberculosis. In one aspect, polypeptides are provided 
30 comprising an immunogenic portion of a soluble M. tuberculosis antigen, or a variant of 
such an antigen that differs only in conservative substitutions and/or modifications. In 
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one embodiment of this aspect, the soluble antigen has one of the following N-terminal 
sequences: 

(a) Asp-Pro- Val-Asp-Ala-Val-Ile-Asn-Thr-Thr-Cys-Asn-Tyr-Gly- 
Gln-Val-Val-Ala- Ala-Leu; (SEQ ID No. 120) 

5 (b) Ala-Val-GIu-Ser-Gly-Met-Leu-Ala-Leu-Gly-Thr-Pro-Ala-Pro- 

Ser; (SEQ ID No. 121) 

(c) Ala-Ala-Met-Lys-Pro-Arg-Thr-Gly-Asp-Gly-Pro-Leu-Glu-Ala- 
Ala-Lys-Glu-Gly-Arg; (SEQ ID No. 122) 

(d) Tyr-Tyr-Trp-Cys-Pro-Gly-Gln-Pro-Phe-Asp-Pro-Ala-Trp-Gly- 
10 Pro; (SEQ ID No. 123) 

(e) Asp-Ile-Gly-Ser-Glu-Ser-Thr-Glu-Asp-Gln-Gln-Xaa-Ala-Val; 
(SEQ ID No. 124) 

(f) Ala-Glu-Glu-Ser-Ile-Ser-Thr-Xaa-Glu-Xaa-Ile-Val-Pro; (SEQ ID 
No. 125) 

15 (g) Asp-Pro-Glu-Pro- Ala-Pro-Pro- Val-Pro-Thr-Thr- Ala- Ala-Ser- 

Pro-Pro-Ser; (SEQ ID No. 126) 

(h) Ala-Pro-Lys-Thr-Tyr-Xaa-Glu-Glu-Leu-Lys-Gly-Thr-Asp-Thr- 
Gly; (SEQ ID No. 127) 

(i) Asp-Pro-Ala-Ser-Ala-Pro-Asp-Val-Pro-Thr-Ala-Ala-Gln-Leu- 
20 Thr-Ser-Leu-Leu-Asn-Ser-Leu-Ala-Asp-Pro-Asn-Val-Ser-Phe- 

Ala-Asn; (SEQ ID No. 128) 

0) Xaa-Asp-Ser-Glu-Lys-Ser-Ala-Thr-Ile-Lys-Val-Thr-Asp-Ala- 
Ser; (SEQ ID No. 134) 

(k) Ala-Gly-Asp-Thr-Xaa-Ile-Tyr-Ile-Val-GIy-Asn-Leu-Thr-Ala- 
25 Asp; (SEQ ID No. 135) or 

(1) Ala-Pro-Glu-Ser-Gly-Ala-Gly-Leu-Gly-Gly-Thr-Val-Gln-Ala- 
Gly; (SEQ ID No. 136) 
wherein Xaa may be any amino acid. 

In a related aspect, polypeptides are provided comprising an 
30 immunogenic portion of an M. tuberculosis antigen, or a variant of such an antigen that 



o o 

WO 98/16646 PCT/US97/18293 



differs only in conservative substitutions and/or modifications, the antigen having one 
of the following N-terminal sequences: 

(m) Xaa-Tyr-Ile-Ala-Tyr-Xaa-Thr-Thr-Ala-Gly-Ile-Val-Pro-Gly-Lys- 
Ile-Asn-Val-His-Leu-Val; (SEQ ID No. 137) or 

5 (n) Asp-Pro-Pro-Asp-Pro-His-GIn-Xaa-Asp-Met-Thr-Lys-Gly-Tyr- 

Tyr-Pro-Gly-Gly-Arg-Arg-Xaa-Phe; (SEQ ID No. 129) 
wherein Xaa may be any amino acid. 

In another embodiment, the soluble M. tuberculosis antigen comprises an 
amino acid sequence encoded by a DNA sequence selected from the group consisting of 
10 the sequences recited in SEQ ID Nos.: 1, 2, 4-10, 13-25, 52, 99 and 101, the 
complements of said sequences, and DNA sequences that hybridize to a sequence 
recited in SEQ ID Nos.: 1, 2, 4-10, 13-25, 52, 99 and 101 or a complement thereof 
under moderately stringent conditions. 

In a related aspect, the polypeptides comprise an immunogenic portion 
15 of a M tuberculosis antigen, or a variant of such an antigen that differs only in 
conservative substitutions and/or modifications, wherein the antigen comprises an 
amino acid sequence encoded by a DNA sequence selected from the group consisting of 
the sequences recited in SEQ ID Nos.: 26-51, 138, 139, 163-183 and 201, the 
complements of said sequences, and DNA sequences that hybridize to a sequence 
20 recited in SEQ ID Nos.: 26-51, 138, 139, 163-183 and 201 or a complement thereof 
under moderately stringent conditions. 

In related aspects, DNA sequences encoding the above polypeptides, 
expression vectors comprising these DNA sequences and host cells transformed or 
transfected with such expression vectors are also provided. 
25 In another aspect, the present invention provides fusion proteins 

comprising a first and a second inventive polypeptide or, alternatively, an inventive 
polypeptide and a known M. tuberculosis antigen. 

Within other aspects, the present invention provides pharmaceutical 
compositions that comprise one or more of the above polypeptides, or a DNA molecule 
30 encoding such polypeptides, and a physiologically acceptable carrier. The invention 
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also provides vaccines comprising one or more of the polypeptides as described above 
and a non-specific immune response enhancer, together with vaccines comprising one 
or more DNA sequences encoding such polypeptides and a non-specific immune 
response enhancer. 

5 In yet another aspect, methods are provided for inducing protective 

immunity in a patient, comprising administering to a patient an effective amount of one 
or more of the above polypeptides. 

In further aspects of this invention, methods and diagnostic kits are 
provided for detecting tuberculosis in a patient. The methods comprise contacting 

10 dermal cells of a patient with one or more of the above polypeptides and detecting an 
immune response on the patient's skin. The diagnostic kits comprise one or more of the 
above polypeptides in combination with an apparatus sufficient to contact the 
polypeptide with the dermal cells of a patient. 

In yet other aspects, methods are provided for detecting tuberculosis in a 

15 patient, such methods comprising contacting dermal cells of a patient with one or more 
polypeptides encoded by a DNA sequence selected from the group consisting of SEQ 
IDNos.: 3, 11, 12, 140, 141, 156-160, 189-193, 199, 200 and 203, the complements of 
said sequences, and DNA sequences that hybridize to a sequence recited in SEQ ID 
Nos.: 3, 11, 12, 140, 141, 156-160, 189-193, 199, 200 and 203; and detecting an 

20 immune response on the patient's skin. Diagnostic kits for use in such methods are also 
provided. 



These and other aspects of the present invention will become apparent 
upon reference to the following detailed description and attached drawings. All 
25 references disclosed herein are hereby incorporated by reference in their entirety as if 
each was incorporated individually. 



BRIEF DESCRIPTION OF THE DRAWINGS AND SEQUENCE IDENTIFIERS 

Figure 1 A and B illustrate the stimulation of proliferation and interferon- 
30 y production in T cells derived from a first and a second M. tuberculosis-immune donor, 
respectively, by the 14 Kd, 20 Kd and 26 Kd antigens described in Example 1. 
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Figure 2 illustrates the stimulation of proliferation and interferon-y 
production in T cells derived from an M. tuberculosis-immune individual by the two 
representative polypeptides TbRa3 and TbRa9. 

Figures 3A-D illustrate the reactivity of antisera raised against secretory 
5 M. tuberculosis proteins, the known M. tuberculosis antigen 85b and the inventive 
antigens Tb38-1 and TbH-9, respectively, with M. tuberculosis lysate (lane 2), M 
tuberculosis secretory proteins (lane 3), recombinant Tb38-1 (lane 4), recombinant 
TbH-9 (lane 5) and recombinant 85b (lane 5). 

Figure 4A illustrates the stimulation of proliferation in a TbH-9-specific 
10 T cell clone by secretory M. tuberculosis proteins, recombinant TbH-9 and a control 
antigen, TbRall. 

Figure 4B illustrates the stimulation of interferon-y production in a TbH- 
9-specific T cell clone by secretory M. tuberculosis proteins, PPD and recombinant 
TbH-9. 

15 Figures 5 A and B illustrate the stimulation of proliferation and 

interferon-y production in TbH9-specific T cells by the fusion protein TbH9-Tb38-l . 

Figures 6A and B illustrate the stimulation of proliferation and 
interferon-y production in Tb3 8-1 -specific T cells by the fusion protein TbH9-Tb38-l . 

Figures 7A and B illustrate the stimulation of proliferation and 
20 interferon-y production in T cells previously shown to respond to both TbH-9 and Tb38- 
1 by the fusion protein TbH9-Tb38-l. 

Figures 8A and B illustrate the stimulation of proliferation and 
interferon-y production in T cells derived from a first M. tuberculosis-immune 
individual by the representative polypeptides XP-1, RDIF6, RDIF8, RDIF10 and 
25 RDIF11. 

Figures 9A and B illustrate the stimulation of proliferation and 
interferon-y production in T cells derived from a second M tuberculosis-immune 
individual by the representative polypeptides XP-1, RDIF6, RDIF8, RDIF10 and 
RDIF11. 

30 



SEQ. ID NO. 1 is the DNA.sequence of TbRal. 
SEQ. ID NO. 2 is the DNA sequence of TbRal 0. 
SEQ. ID NO. 3 is the DNA sequence of TbRal 1. 
SEQ. ID NO. 4 is the DNA sequence of TbRal 2. 
SEQ. ID NO. 5 is the DNA sequence of TbRal 3. 
SEQ. ID NO. 6 is the DNA sequence of TbRal 6. 
SEQ. ID NO. 7 is the DNA sequence of TbRal 7. 
SEQ. ID NO. 8 is the DNA sequence of TbRal 8. 
SEQ. ID NO. 9 is the DNA sequence of TbRal 9. 
SEQ. ID NO. 10 is the DNA sequence of TbRa24. 
SEQ. ID NO. 1 1 is the DNA sequence of TbRa26. 
SEQ. ID NO. 12 is the DNA sequence of TbRa28. 
SEQ. ID NO. 13 is the DNA sequence of TbRa29. 
SEQ. ID NO. 14 is the DNA sequence of TbRa2A. 
SEQ. ID NO. 15 is the DNA sequence of TbRa3. 
SEQ. ID NO. 16 is the DNA sequence of TbRa32. 
SEQ. ID NO. 17 is the DNA sequence of TbRa35. 
SEQ. ID NO. 1 8 is the DNA sequence of TbRa36. 
SEQ. ID NO. 19 is the DNA sequence of TbRa4. 
SEQ. ID NO. 20 is the DNA sequence of TbRa9. 
SEQ. ID NO. 21 is the DNA sequence of TbRaB. 
SEQ. ID NO. 22 is the DNA sequence of TbRaC. 
SEQ. ID NO. 23 is the DNA sequence of TbRaD. 
SEQ. ID NO. 24 is the DNA sequence of YYWCPG. 
SEQ. ID NO. 25 is the DNA sequence of AAMK. 
SEQ. ID NO. 26 is the DNA sequence of TbL-23. 
SEQ. ID NO. 27 is the DNA sequence of TbL-24. 
SEQ. ID NO. 28 is the DNA sequence of TbL-25. 
SEQ. ID NO. 29 is the DNA sequence of TbL-28. 
SEQ. ID NO. 30 is the DNA sequence of TbL-29. 
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SEQ. ID NO. 3 1 is the DNA sequence of TbH-5. 

SEQ. ID NO. 32 is the DNA sequence of TbH-8. 

SEQ. ID NO. 33 is the DNA sequence of TbH-9. 

SEQ. ID NO. 34 is the DNA sequence of TbM-1. 
5 SEQ. ID NO. 3 5 is the DNA sequence of TbM-3 . 

SEQ. ID NO. 36 is the DNA sequence of TbM-6. 

SEQ. ID NO. 37 is the DNA sequence of TbM-7. 

SEQ. ID NO. 38 is the DNA sequence of TbM-9. 

SEQ. ID NO. 39 is the DNA sequence of TbM-12. 
10 SEQ. ID NO. 40 is the DNA sequence of TbM-13. 

SEQ. ID NO. 41 is the DNA sequence of TbM-1 4. 

SEQ. ID NO. 42 is the DNA sequence of TbM-15. 

SEQ. ID NO. 43 is the DNA sequence of TbH-4. 

SEQ. ID NO. 44 is the DNA sequence of TbH-4-FWD. 
15 SEQ. ID NO. 45 is the DNA sequence of TbH- 1 2. 

SEQ. ID NO. 46 is the DNA sequence of Tb38-1. 

SEQ. ID NO. 47 is the DNA sequence of Tb38-4. 

SEQ. ID NO. 48 is the DNA sequence of TbL-17. 

SEQ. ID NO. 49 is the DNA sequence of TbL-20. 
20 SEQ. ID NO. 50 is the DNA sequence of TbL-2 1 . 

SEQ. ID NO. 51 is the DNA sequence of TbH-16. 

SEQ. ID NO. 52 is the DNA sequence of DPEP. 

SEQ. ID NO. 53 is the deduced amino acid sequence of DPEP. 

SEQ. ID NO. 54 is the protein sequence of DPV N-terminal Antigen. 
25 SEQ. ID NO. 55 is the protein sequence of AVGS N-terminal Antigen. 

SEQ. ID NO. 56 is the protein sequence of AAMK N-terminal Antigen. 

SEQ. ID NO. 57 is the protein sequence of YYWC N-terminal Antigen. 

SEQ. ID NO. 58 is the protein sequence of DIGS N-terminal Antigen. 

SEQ. ID NO. 59 is the protein sequence of AEES N-terminal Antigen. 
30 SEQ. ID NO. 60 is the protein sequence of DPEP N-terminal Antigen. 



SEQ. ID NO. 61 is the protein sequence of APKT N-terminal Antigen. 

SEQ. ID NO. 62 is the protein sequence of DPAS N-terminal Antigen. 

SEQ. ID NO. 63 is the deduced amino acid sequence of TbRal. 

SEQ. ID NO. 64 is the deduced amino acid sequence of TbRal 0. 

SEQ. ID NO. 65 is the deduced amino acid sequence of TbRal 1 . 

SEQ. ID NO. 66 is the deduced amino acid sequence of TbRal 2/ 

SEQ. ID NO. 67 is the deduced amino acid sequence of TbRal 3. 

SEQ. ID NO. 68 is the deduced amino acid sequence of TbRal 6. 

SEQ. ID NO. 69 is the deduced amino acid sequence of TbRal 7. 

SEQ. ID NO. 70 is the deduced amino acid sequence of TbRal 8. 

SEQ. ID NO. 71 is the deduced amino acid sequence of TbRal 9. 

SEQ. ID NO. 72 is the deduced amino acid sequence of TbRa24. 

SEQ. ID NO. 73 is the deduced amino acid sequence of TbRa26. 

SEQ. ID NO. 74 is the deduced amino acid sequence of TbRa28. 

SEQ. ID NO. 75 is the deduced amino acid sequence of TbRa29. 

SEQ. ID NO. 76 is the deduced amino acid sequence of TbRa2A. 

SEQ. ID NO. 77 is the deduced amino acid sequence of TbRa3. 

SEQ. ID NO. 78 is the deduced amino acid sequence of TbRa32. 

SEQ. ID NO. 79 is the deduced amino acid sequence of TbRa35. 
SEQ. ID NO. 80 is the deduced amino acid sequence of TbRa36. 
SEQ. ID NO. 81 is the deduced amino acid sequence of TbRa4. 
SEQ. ID NO. 82 is the deduced amino acid sequence of TbRa9. 
SEQ. ID NO. 83 is the deduced amino acid sequence of TbRaB. 
SEQ. ID NO. 84 is the deduced amino acid sequence of TbRaC. 
SEQ. ID NO. 85 is the deduced amino acid sequence of TbRaD. 
SEQ. ID NO. 86 is the deduced amino acid sequence of YYWCPG. 
SEQ. ID NO. 87 is the deduced amino acid sequence of TbAAMK. 
SEQ. ID NO. 88 is the deduced amino acid sequence of Tb38-1. 
SEQ. ID NO. 89 is the deduced amino acid sequence of TbH-4. 
SEQ. ID NO. 90 is the deduced amino acid sequence of TbH-8. 
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SEQ. ID NO. 91 is the deduced amino acid sequence of TbH-9. 

SEQ. ID NO. 92 is the deduced amino acid sequence of TbH-12. 

SEQ. ID NO. 93 is the amino acid sequence of Tb38-1 Peptide 1. 

SEQ. ID NO. 94 is the amino acid sequence of Tb38-1 Peptide 2. 
5 SEQ. ID NO. 95 is the amino acid sequence of Tb38-1 Peptide 3. 

SEQ. ID NO. 96 is the amino acid sequence of Tb38-1 Peptide 4. 

SEQ. ID NO. 97 is the amino acid sequence of Tb38-1 Peptide 5. 

SEQ, ID NO. 98 is the amino acid sequence of Tb38-1 Peptide 6. 

SEQ. ID NO. 99 is the DNA sequence of DPAS. 
10 SEQ. ID NO. 100 is the deduced amino acid sequence of DPAS. 

SEQ. ID NO. 101 is the DNA sequence of DPV. 

SEQ. ID NO. 102 is the deduced amino acid sequence of DPV. 

SEQ. ID NO. 103 is the DNA sequence of ESAT-6. 

SEQ. ID NO. 104 is the deduced amino acid sequence of ESAT-6. 
15 SEQ. ID NO. 105 is the DNA sequence of TbH-8-2. 

SEQ. ID NO. 106 is the DNA sequence of TbH-9FL. 

SEQ. ID NO. 107 is the deduced amino acid sequence of TbH-9FL. 

SEQ. ID NO. 1 08 is the DNA sequence of TbH-9- 1 . 

SEQ. ID NO. 1 09 is the deduced amino acid sequence of TbH-9- 1 . 
20 SEQ. ID NO. 1 1 0 is the DNA sequence of TbH-9-4. 

SEQ. ID NO. 1 1 1 is the deduced amino acid sequence of TbH-9-4. 

SEQ. ID NO. 1 12 is the DNA sequence of Tb38-1F2 IN. 

SEQ. ID NO. 1 13 is the DNA sequence of Tb38-2F2 RP. 

SEQ. ID NO. 1 14 is the deduced amino acid sequence of Tb37-FL. 
25 SEQ. ID NO. 1 15 is the deduced amino acid sequence of Tb38-IN. 

SEQ. ID NO. 1 16 is the DNA sequence of Tb38-1F3. 

SEQ. ID NO. 1 17 is the deduced amino acid sequence of Tb38-1F3. 

SEQ. ID NO. 1 18 is the DNA sequence of Tb38-1F5. 

SEQ. ID NO. 1 19 is the DNA sequence of Tb38-1F6. 
30 SEQ. ID NO. 120 is the deduced N-terminal amino acid sequence of DPV. 
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SEQ. ID NO. 121 is the deduced N-terminal amino acid sequence of AVGS. 
SEQ. ID NO. 122 is the deduced N-terminal amino acid sequence of AAMK. 
SEQ. ID NO. 123 is the deduced N-terminal amino acid sequence of YYWC. 
SEQ. ID NO. 124 is the deduced N-terminal amino acid sequence of DIGS. 
SEQ. ID NO. 125 is the deduced N-terminal amino acid sequence of AEES. 
SEQ. ID NO. 126 is the deduced N-terminal amino acid sequence of DPEP. 
SEQ. ID NO. 127 is the deduced N-terminal amino acid sequence of APKT. 
SEQ. ID NO. 128 is the deduced amino acid sequence of DP AS. 
SEQ. ID NO. 129 is the protein sequence of DPPD N-terminal Antigen. 
SEQ ID NO. 130-133 are the protein sequences of four DPPD cyanogen 
bromide fragments. 

SEQ ID NO. 134 is the N-terminal protein sequence of XDS antigen. 

SEQ ID NO. 135 is the N-terminal protein sequence of AGD antigen. 

SEQ ID NO. 136 is the N-terminal protein sequence of APE antigen. 
15 SEQ ID NO. 137 is the N-terminal protein sequence of XYI antigen. 

SEQ ID NO. 138 is the DNA sequence of TbH-29. 

SEQ ID NO. 139 is the DNA sequence of TbH-30. 

SEQ ID NO. 140 is the DNA sequence of TbH-32. 

SEQ ID NO. 141 is the DNA sequence of TbH-33. 
20 SEQ ID NO. 142 is the predicted amino acid sequence of TbH-29. 

SEQ ID NO. 143 is the predicted amino acid sequence of TbH-30. 

SEQ ID NO. 144 is the predicted amino acid sequence of TbH-32. 

SEQ ID NO. 145 is the predicted amino acid sequence of TbH-33. 

SEQ ID NO: 146-151 are PCR primers used in the preparation of a fusion 
25 protein containing TbRa3, 38 kD and Tb38-1 . 

SEQ ID NO: 152 is the DNA sequence of the fusion protein containing TbRa3, 

38kDand Tb38-1. 

SEQ ID NO: 153 is the amino acid sequence of the fusion protein containing 
TbRa3, 38 kD and Tb38-1. 
30 SEQ ID NO: 1 54 is the DNA sequence of the M. tuberculosis antigen 38 kD. 
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SEQ ID NO: 155 is the amino acid sequence of the M. tuberculosis antigen 38 
kD. 

SEQ ID NO: 1 56 is the DNA sequence of XP14. 

SEQ ID NO: 1 57 is the DNA sequence of XP24. 

SEQ ID NO: 1 58 is the DNA sequence of XP3 1 . 

SEQ ID NO: 159 is the 5' DNA sequence of XP32. 

SEQ ID NO: 160 is the 3' DNA sequence of XP32. 

SEQ ID NO: 1 6 1 is the predicted amino acid sequence of XP 1 4. 

SEQ ID NO: 162 is the predicted amino acid sequence encoded by the reverse 

complement of XP14. 

SEQ ID NO: 163 is the DNA sequence of XP27. 
SEQ ID NO: 164 is the DNA sequence of XP36. 





SEQ ID NO: 


165 is the 5' 


DNA 


sequence 


ofXP4. 




SEQ ID NO: 


166 is the 5' 


DNA 


sequence 


ofXP5. 


15 


SEQ ID NO: 


167 is the 5' 


DNA 


sequence 


ofXP17. 




SEQ ID NO: 


168 is the 5' 


DNA 


sequence 


ofXP30. 




SEQ ID NO: 


169 is the 5' 


DNA 


sequence 


ofXP2. 




SEQ ID NO: 


170 is the 3' 


DNA 


sequence 


ofXP2. 




SEQ ID NO: 


171 is the 5' 


DNA 


sequence 


ofXP3. 


20 


SEQ ID NO: 


172 is the 3' 


DNA 


sequence 


ofXP3. 




SEQ ID NO: 


173 is the 5' 


DNA 


sequence 


ofXP6. 




SEQ ID NO: 


174 is the 3' 


DNA 


sequence 


ofXP6. 




SEQ ID NO: 


175 is the 5' 


DNA 


sequence 


ofXP18. 




SEQ ID NO: 


176 is the 3' 


DNA 


sequence 


ofXP18. 


25 


SEQ ID NO: 


177 is the 5' 


DNA 


sequence 


ofXP19. 




SEQ ID NO: 


178 is the 3' 


DNA 


sequence 


ofXP19. 




SEQ ID NO: 


179 is the 5' 


DNA 


sequence 


ofXP22. 




SEQ ID NO: 


180 is the 3' 


DNA 


sequence 


ofXP22. 




SEQ ID NO: 


181 is the 5' 


DNA 


sequence 


ofXP25. 


30 


SEQ ID NO: 


182 is the 3' 


DNA 


sequence 


ofXP25. 
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SEQ ID NO: 183 is the full-length DNA sequence of TbH4-XPl. 
SEQ ID NO: 1 84 is the predicted amino acid sequence of TbH4-XPl . 
SEQ ID NO: 185 is the predicted amino acid sequence encoded by the reverse 
complement of TbH4-XPl. 
5 SEQ ID NO: 1 86 is a first predicted amino acid sequence encoded by XP36. 

SEQ ID NO: 187 is a second predicted amino acid sequence encoded by XP36. 
SEQ ID NO: 188 is the predicted amino acid sequence encoded by the reverse 
complement of XP36. 

SEQ ID NO: 1 89 is the DNA sequence of RDIF2. 
10 SEQ ID NO: 190 is the DNA sequence of RDIF5. 

SEQ ID NO: 1 9 1 is the DNA sequence of RDIF8. 

SEQ ID NO: 1 92 is the DNA sequence of RDIF 1 0. 

SEQ ID NO: 1 93 is the DNA sequence of RDIF 1 1 . 

SEQ ID NO: 1 94 is the predicted amino acid sequence of RDIF2. 
15 SEQ ID NO: 195 is the predicted amino acid sequence of RDIF5. 

SEQ ID NO: 1 96 is the predicted amino acid sequence of RDIF8. 

SEQ ID NO: 197 is the predicted amino acid sequence of RDIF 10. 

SEQ ID NO: 1 98 is the predicted amino acid sequence of RDIF 11. 

SEQ ID NO: 1 99 is the 5 ' DNA sequence of RDIF 12. 
20 SEQ ID NO : 200 is the 3 ' DNA sequence of RDIF 12. 

SEQ ID NO: 201 is the DNA sequence of RDIF7. 

SEQ ID NO: 202 is the predicted amino acid sequence of RDIF7. 

SEQ ID NO: 203 is the DNA sequence of DIF2-1. 

SEQ ID NO: 204 is the predicted amino acid sequence of DIF2-1 . 
25 SEQ ID NO: 205-212 are PCR primers used in the preparation of a fusion 

protein containing TbRa3, 38 kD, Tb38-1 and DPEP (hereinafter referred to as 

TbF-2). 

SEQ ID NO: 213 is the DNA sequence of the fusion protein TbF-2. 
SEQ ID NO: 2 14 is the amino acid sequence of the fusion protein TbF-2. 
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DETAILED DESCRIPTION OF THE INVENTION 

As noted above, the present invention is generally directed to 
compositions and methods for preventing, treating and diagnosing tuberculosis. The 
compositions of the subject invention include polypeptides that comprise at least one 
5 immunogenic portion of a M tuberculosis antigen, or a variant of such an antigen that 
differs only in conservative substitutions and/or modifications. Polypeptides within the 
scope of the present invention include, but are not limited to, immunogenic soluble 
M tuberculosis antigens. A "soluble M tuberculosis antigen" is a protein of 
M tuberculosis origin that is present in M. tuberculosis culture filtrate. As used herein, 

10 the term "polypeptide" encompasses amino acid chains of any length, including full 
length proteins (i.e., antigens), wherein the amino acid residues are linked by covalent 
peptide bonds. Thus, a polypeptide comprising an immunogenic portion of one of the 
above antigens may consist entirely of the immunogenic portion, or may contain 
additional sequences. The additional sequences may be derived from the native 

15 M. tuberculosis antigen or may be heterologous, and such sequences may (but need not) 
be immunogenic. 

"Immunogenic," as used herein, refers to the ability to elicit an immune 
response (e.g., cellular) in a patient, such as a human, and/or in a biological sample. In 
particular, antigens that are immunogenic (and immunogenic portions or other variants 

20 of such antigens) are capable of stimulating cell proliferation, interleukin-12 production 
and/or interferon-y production in biological samples comprising one or more cells 
selected from the group of T cells, NK cells, B cells and macrophages, where the cells 
are derived from an M. tuberculosis-immune individual. Polypeptides comprising at 
least an immunogenic portion of one or more M. tuberculosis antigens may generally be 

25 used to detect tuberculosis or to induce protective immunity against tuberculosis in a 
patient. 

The compositions and methods of this invention also encompass variants 
of the above polypeptides. A "variant," as used herein, is a polypeptide that differs 
from the native antigen only in conservative substitutions and/or modifications, such 
30 that the ability of the polypeptide to induce an immune response is retained. Such 
variants may generally be identified by modifying one of the above polypeptide 
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sequences, and evaluating the immunogenic properties of the modified polypeptide 
using, for example, the representative procedures described herein. 

A "conservative substitution" is one in which an amino acid is 
substituted for another amino acid that has similar properties, such that one skilled in 
5 the art of peptide chemistry would expect the secondary structure and hydropathic 
nature of the polypeptide to be substantially unchanged. In general, the following 
groups of amino acids represent conservative changes: (1) ala, pro, gly, glu, asp, gin, 
asn, ser, thr; (2) cys, ser, tyr, thr; (3) val, ile, leu, met, ala, phe; (4) lys, arg, his; and 
(5) phe, tyr, trp, his. 

10 Variants may also (or alternatively) be modified by, for example, the 

deletion or addition of amino acids that have minimal influence on the immunogenic 
properties, secondary structure and hydropathic nature of the polypeptide. For example, 
a polypeptide may be conjugated to a signal (or leader) sequence at the N-terminal end 
of the protein which co-translationally or post-translationally directs transfer of the 
15 protein. The polypeptide may also be conjugated to a linker or other sequence for ease 
of synthesis, purification or identification of the polypeptide (e.g., poly-His), or to 
enhance binding of the polypeptide to a solid support. For example, a polypeptide may 
be conjugated to an immunoglobulin Fc region. 

In a related aspect, combination polypeptides are disclosed. A 
20 "combination polypeptide" is a polypeptide comprising at least one of the above 
immunogenic portions and one or more additional immunogenic M. tuberculosis 
sequences, which are joined via a peptide linkage into a single amino acid chain. The 
sequences may be joined directly (i.e., with no intervening amino acids) or may be 
joined by way of a linker sequence (e.g., Gly-Cys-Gly) that does not significantly 
25 diminish the immunogenic properties of the component polypeptides. 

In general, M. tuberculosis antigens, and DNA sequences encoding such 
antigens, may be prepared using any of a variety of procedures. For example, soluble 
antigens may be isolated from M. tuberculosis culture filtrate by procedures known to 
those of ordinary skill in the art, including anion-exchange and reverse phase 
30 chromatography. Purified antigens are then evaluated for their ability to elicit an 
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appropriate immune response (e.g., cellular) using, for example, the representative 
methods described herein. Immunogenic antigens may then be partially sequenced 
using techniques such as traditional Edman chemistry. See Edman and Berg, Eur. J. 
Biochem. 50:116-132, 1967. 
5 Immunogenic antigens may also be produced recombinantly using a 

DNA sequence that encodes the antigen, which has been inserted into an expression 
vector and expressed in an appropriate host. DNA molecules encoding soluble antigens 
may be isolated by screening an appropriate M. tuberculosis expression library with 
anti-sera (e.g., rabbit) raised specifically against soluble M. tuberculosis antigens. DNA 

10 sequences encoding antigens that may or may not be soluble may be identified by 
screening an appropriate M tuberculosis genomic or cDNA expression library with sera 
obtained from patients infected with M. tuberculosis. Such screens may generally be 
performed using techniques well known to those of ordinary skill in the art, such as 
those described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold 

1 5 Spring Harbor Laboratories, Cold Spring Harbor, NY, 1 989. 

DNA sequences encoding soluble antigens may also be obtained by 
screening an appropriate M. tuberculosis cDNA or genomic DNA library for DNA 
sequences that hybridize to degenerate oligonucleotides derived from partial amino acid 
sequences of isolated soluble antigens. Degenerate oligonucleotide sequences for use in 

20 such a screen may be designed and synthesized, and the screen may be performed, as 
described (for example) in Sambrook et al., Molecular Cloning; A Laboratory Manual, 
Cold Spring Harbor Laboratories, Cold Spring Harbor, NY, 1989 (and references cited 
therein). Polymerase chain reaction (PCR) may also be employed, using the above 
oligonucleotides in methods well known in the art, to isolate a nucleic acid probe from a 

25 cDNA or genomic library. The library screen may then be performed using the isolated 
probe. 

Alternatively, genomic or cDNA libraries derived from M. tuberculosis 
may be screened directly using peripheral blood mononuclear cells (PBMCs) or T cell 
lines or clones derived from one or more M. tuberculosis-immune individuals. In 
30 general, PBMCs and/or T cells for use in such screens may be prepared as described 
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below. Direct library screens may generally be performed by assaying pools of 
expressed recombinant proteins for the ability to induce proliferation and/or interferon-y 
production in T cells derived from an M. tuberculosis-immune individual. 
Alternatively, potential T cell antigens may be first selected based on antibody 
5 reactivity, as described above. 

Regardless of the method of preparation, the antigens (and immunogenic 
portions thereof) described herein (which may or may not be soluble) have the ability to 
induce an immunogenic response. More specifically, the antigens have the ability to 
induce proliferation and/or cytokine production (i.e., interferon-y and/or interleukin-12 
10 production) in T cells, NK cells, B cells and/or macrophages derived from an 
M. tuberculosis-immune individual. The selection of cell type for use in evaluating an 
immunogenic response to a antigen will, of course, depend on the desired response. For 
example, interleukin-12 production is most readily evaluated using preparations 
containing B cells and/or macrophages. An M. tuberculosis-immune individual is one 
15 who is considered to be resistant to the development of tuberculosis by virtue of having 
mounted an effective T cell response to M. tuberculosis (i.e., substantially free of 
disease symptoms). Such individuals may be identified based on a strongly positive 
(i.e., greater than about 10 mm diameter induration) intradermal skin test response to 
tuberculosis proteins (PPD) and an absence of any signs or symptoms of tuberculosis 

20 disease. T cells, NK cells, B cells and macrophages derived from M. tuberculosis- 
immune individuals may be prepared using methods known to those of ordinary skill in 
the art. For example, a preparation of PBMCs (i.e., peripheral blood mononuclear cells) 
may be employed without further separation of component cells. PBMCs may 
generally be prepared, for example, using density centrifugation through Ficoll™ 

25 (Winthrop Laboratories, NY). T cells for use in the assays described herein may also be 
purified directly from PBMCs. Alternatively, an enriched T cell line reactive against 
mycobacterial proteins, or T cell clones reactive to individual mycobacterial proteins, 
may be employed. Such T cell clones may be generated by, for example, culturing 
PBMCs from M. tuberculosis-immune individuals with mycobacterial proteins for a 

30 period of 2-4 weeks. This allows expansion of only the mycobacterial protein-specific 
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T cells, resulting in a line composed solely of such cells. These cells may then be 
cloned and tested with individual proteins, using methods known to those of ordinary 
skill in the art, to more accurately define individual T cell specificity. In general, 
antigens that test positive in assays for proliferation and/or cytokine production (i.e., 
5 interferon-y and/or interleukin-12 production) performed using T cells, NK cells, B cells 
and/or macrophages derived from an M. tuberculosis-immune individual are considered 
immunogenic. Such assays may be performed, for example, using the representative 
procedures described below. Immunogenic portions of such antigens may be identified 
using similar assays, and may be present within the polypeptides described herein. 

10 The ability of a polypeptide (e.g., an immunogenic antigen, or a portion 

or other variant thereof) to induce cell proliferation is evaluated by contacting the cells 
(e.g., T cells and/or NK cells) with the polypeptide and measuring the proliferation of 
the cells. In general, the amount of polypeptide that is sufficient for evaluation of about 
10 5 cells ranges from about lOng/mL to about 100|j.g/mL and preferably is about 

15 10 |ag/mL. The incubation of polypeptide with cells is typically performed at 37°C for 
about six days. Following incubation with polypeptide, the cells are assayed for a 
proliferative response, which may be evaluated by methods known to those of ordinary 
skill in the art, such as exposing cells to a pulse of radiolabeled thymidine and 
measuring the incorporation of label into cellular DNA. In general, a polypeptide that 

20 results in at least a three fold increase in proliferation above background (i.e., the 
proliferation observed for cells cultured without polypeptide) is considered to be able to 
induce proliferation.- 

The ability of a polypeptide to stimulate the production of interferon-y 
and/or interleukin-12 in cells may be evaluated by contacting the cells with the 

25 polypeptide and measuring the level of interferon-y or interleukin-12 produced by the 
cells. In general, the amount of polypeptide that is sufficient for the evaluation of about 
10 5 cells ranges from about lOng/mL to about 100|j.g/mL and preferably is about 
10 jag/mL. The polypeptide may, but need not, be immobilized on a solid support, such 
as a bead or a biodegradable microsphere, such as those described in U.S. Patent 

30 Nos. 4,897,268 and 5,075,109. The incubation of polypeptide with the cells is typically 
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performed at 37°C for about six days. Following incubation with polypeptide, the cells 
are assayed for interferon-y and/or interleukin-12 (or one or more subunits thereof), 
which may be evaluated by methods known to those of ordinary skill in the art, such as 
an enzyme-linked immunosorbent assay (ELISA) or, in the case of IL-12 P70 subunit, a 
5 bioassay such as an assay measuring proliferation of T cells. In general, a polypeptide 
that results in the production of at least 50 pg of interferon-y per mL of cultured 
supernatant (containing 10 4 -10 5 T cells per mL) is considered able to stimulate the 
production of interferon-y. A polypeptide that stimulates the production of at least 
10 pg/mL of IL-12 P70 subunit, and/or at least 100 pg/mL of IL-12 P40 subunit, per 10 5 
10 macrophages or B cells (or per 3 x 10 5 PBMC) is considered able to stimulate the 
production of IL-12. 

In general, immunogenic antigens are those antigens that stimulate 
proliferation and/or cytokine production (i.e., interferon-y and/or interleukin-12 
production) in T cells, NK cells, B cells and/or macrophages derived from at least about 
15 25% of M tuberculosis-immune individuals. Among these immunogenic antigens, 
polypeptides having superior therapeutic properties may be distinguished based on the 
magnitude of the responses in the above assays and based on the percentage of 
individuals for which a response is observed. In addition, antigens having superior 
therapeutic properties will not stimulate proliferation and/or cytokine production in 
20 vitro in cells derived from more than about 25% of individuals that are not 
M. tuberculosis-immune, thereby eliminating responses that are not specifically due to 
M. tuberculosis-responsive cells. Those antigens that induce a response in a high 
percentage of T cell, NK cell, B cell and/or macrophage preparations from 
M. tuberculosis-immune individuals (with a low incidence of responses in cell 
25 preparations from other individuals) have superior therapeutic properties. 

Antigens with superior therapeutic properties may also be identified 
based on their ability to diminish the severity of M tuberculosis infection in 
experimental animals, when administered as a vaccine. Suitable vaccine preparations 
for use on experimental animals are described in detail below. Efficacy may be 
30 determined based on the ability of the antigen to provide at least about a 50% reduction 
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in bacterial numbers and/or at least about a 40% decrease in mortality following 
experimental infection. Suitable experimental animals include mice, guinea pigs and 
primates. 

Antigens having superior diagnostic properties may generally be 
5 identified based on the ability to elicit a response in an intradermal skin test performed 
on an individual with active tuberculosis, but not in a test performed on an individual 
who is not infected with M. tuberculosis. Skin tests may generally be performed as 
described below, with a response of at least 5 mm induration considered positive. 

Immunogenic portions of the antigens described herein may be prepared 

10 and identified using well known techniques, such as those summarized in Paul, 
Fundamental Immunology, 3d ed., Raven Press, 1993, pp. 243-247 and references cited 
therein. Such techniques include screening polypeptide portions of the native antigen 
for immunogenic properties. The representative proliferation and cytokine production 
assays described herein may generally be employed in these screens. An immunogenic 

1 5 portion of a polypeptide is a portion that, within such representative assays, generates 
an immune response (e.g., proliferation, interferon-y production and/or interleukin-12 
production) that is substantially similar to that generated by the full length antigen. In 
other words, an immunogenic portion of an antigen may generate at least about 20%, 
and preferably about 100%, of the proliferation induced by the full length antigen in the 

20 model proliferation assay described herein. An immunogenic portion may also, or 
alternatively, stimulate the production of at least about 20%, and preferably about 
100%, of the interferon-y and/or interleukin-12 induced by the full length antigen in the 
model assay described herein. 

Portions and other variants of M. tuberculosis antigens may be generated 

25 by synthetic or recombinant means. Synthetic polypeptides having fewer than about 
100 amino acids, and generally fewer than about 50 amino acids, may be generated 
using techniques well known to those of ordinary skill in the art. For example, such 
polypeptides may be synthesized using any of the commercially available solid-phase 
techniques, such as the Merrifield solid-phase synthesis method, where amino acids are 

30 sequentially added to a growing amino acid chain. See Merrifield, J. Am. Chem. Soc. 



WO 98/16646 



PCTAJS97/18293 



21 

55:2149-2146, 1963. Equipment for automated synthesis of polypeptides is 
commercially available from suppliers such as Applied BioSystems, Inc., Foster City, 
CA, and may be operated according to the manufacturers instructions. Variants of a 
native antigen may generally be prepared using standard mutagenesis techniques, such 
5 as oligonucleotide-directed site-specific mutagenesis. Sections of the DNA sequence 
may also be removed using standard techniques to permit preparation of truncated 
polypeptides. 

Recombinant polypeptides containing portions and/or variants of a 
native antigen may be readily prepared from a DNA sequence encoding the polypeptide 

10 using a variety of techniques well known to those of ordinary skill in the art. For 
example, supernatants from suitable host/vector systems which secrete recombinant 
protein into culture media may be first concentrated using a commercially available 
filter. Following concentration, the concentrate may be applied to a suitable 
purification matrix such as an affinity matrix or an ion exchange resin. Finally, one or 

15 more reverse phase HPLC steps can be employed to further purify a recombinant 
protein. 

Any of a variety of expression vectors known to those of ordinary skill in 
the art may be employed to express recombinant polypeptides of this invention. 
Expression may be achieved in any appropriate host cell that has been transformed or 

20 transfected with an expression vector containing a DNA molecule that encodes a 
recombinant polypeptide. Suitable host cells include prokaryotes, yeast and higher 
eukaryotic cells. Preferably, the host cells employed are E. coli, yeast or a mammalian 
cell line such as COS or CHO. The DNA sequences expressed in this manner may 
encode naturally occurring antigens, portions of naturally occurring antigens, or other 

25 variants thereof. 

In general, regardless of the method of preparation, the polypeptides 
disclosed herein are prepared in substantially pure form. Preferably, the polypeptides 
are at least about 80% pure, more preferably at least about 90% pure and most 
preferably at least about 99% pure. In certain preferred embodiments, described in 
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detail below, the substantially pure polypeptides are incorporated into pharmaceutical 
compositions or vaccines for use in one or more of the methods disclosed herein. 

In certain specific embodiments, the subject invention discloses 
polypeptides comprising at least an immunogenic portion of a soluble M. tuberculosis 
5 antigen having one of the following N-terminal sequences, or a variant thereof that 
differs only in conservative substitutions and/or modifications: 

(a) Asp-Pro- Val-Asp-Ala- Val-Ile-Asn-Thr-Thr-Cys-Asn-Tyr-Gly- 
Gln-Val-Val- Ala-Ala-Leu; (SEQ ID No. 120) 

(b) Ala-Val-Glu-Ser-Gly-Met-Leu-Ala-Leu-GIy-Thr-Pro-Ala-Pro- 
10 Ser; (SEQ ID No. 121) 

(c) Ala-Ala-Met-Lys-Pro-Arg-Thr-Gly-Asp-Gly-Pro-Leu-Glu-Ala- 
Ala-Lys-Glu-Gly-Arg; (SEQ ID No. 122) 

(d) Tyr-Tyr-Trp-Cys-Pro-Gly-Gln-Pro-Phe-Asp-Pro-Ala-Trp-Gly- 
Pro; (SEQ ID No. 123) 

15 (e) Asp-Ile-Gly-Ser-Glu-Ser-Thr-Glu-Asp-Gln-GIn-Xaa-Ala-Val; 

(SEQ ID No. 124) 

(f) Ala-Glu-Glu-Ser-Ile-Ser-Thr-Xaa-Glu-Xaa-Ile-Val-Pro; (SEQ ID 
No. 125) 

(g) Asp-Pro-Glu-Pro-Ala-Pro-Pro-Val-Pro-Thr-Ala-Ala-Ala-Ser- 
20 Pro-Pro-Ser; (SEQ ID No. 126) 

(h) Ala-Pro-Lys-Thr-Tyr-Xaa-Glu-Glu-Leu-Lys-Gly-Thr-Asp-Thr- 
Gly; (SEQ ID No. 127) 

(i) Asp-Pro-Ala-Ser-Ala-Pro-Asp-Val-Pro-Thr-Ala-Ala-Gln-Leu- 
Thr-Ser-Leu-Leu-Asn-Ser-Leu-Ala-Asp-Pro-Asn-Val-Ser-Phe- 

25 Ala-Asn; (SEQ ID No. 128) 

0) Xaa-Asp-Ser-Glu-Lys-Ser-Ala-Thr-Ile-Lys-Val-Thr-Asp-Ala- 
Ser; (SEQ ID No. 134) 

(k) Ala-Gly-Asp-Thr-Xaa-Ue-Tyr-Ile-Val-Gly-Asn-Leu-Thr-Ala- 
Asp; (SEQ ID No. 135) or 
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(1) Ala-Pro-GIu-Ser-Gly-Ala-Gly-Leu-Gly-Gly-Thr-Val-Gln-Ala- 
Gly; (SEQ ID No. 136) 

wherein Xaa may be any amino acid, preferably a cysteine residue. A DNA sequence 

encoding the antigen identified as (g) above is provided in SEQ ID No. 52, and the 

5 polypeptide encoded by SEQ ID No. 52 is provided in SEQ ID No. 53. A DNA 

sequence encoding the antigen defined as (a) above is provided in SEQ ID No. 101; its 

deduced amino acid sequence is provided in SEQ ID No. 102. A DNA sequence 

corresponding to antigen (d) above is provided in SEQ ID No. 24 a DNA sequence 

corresponding to antigen (c) is provided in SEQ ID No. 25 and a DNA sequence 

10 corresponding to antigen (i) is provided in SEQ ID No. 99; its deduced amino acid 
sequence is provided in SEQ ID No. 100. 

In a further specific embodiment, the subject invention discloses 
polypeptides comprising at least an immunogenic portion of an M. tuberculosis antigen 
having one of the following N-terminal sequences, or a variant thereof that differs only 

1 5 in conservative substitutions and/or modifications: 

(m) Xaa-Tyr-Ile-Ala-Tyr-Xaa-Thr-Thr-Ala-Gly-Ile-Val-Pro-Gly-Lys- 

Ile-Asn-Val-His-Leu-Val; (SEQ ID No 1 37) or 
(n) Asp-Pro-Pro-Asp-Pro-His-Gln-Xaa-Asp-Met-Thr-Lys-Gly-Tyr- 
Tyr-Pro-Gly-Gly-Arg-Arg-Xaa-Phe; (SEQ ID No. 129) 
20 wherein Xaa may be any amino acid, preferably a cysteine residue. 

In other specific embodiments, the subject invention discloses 
polypeptides comprising at least an immunogenic portion of a soluble M tuberculosis 
antigen (or a variant of such an antigen) that comprises one or more of the amino acid 
sequences encoded by (a) the DNA sequences of SEQ ID Nos.: 1, 2, 4-10, 13-25 and 
25 52; (b) the complements of such DNA sequences, or (c) DNA sequences substantially 
homologous to a sequence in (a) or (b). 

In further specific embodiments, the subject invention discloses 
polypeptides comprising at least an immunogenic portion of a M. tuberculosis antigen 
(or a variant of such an antigen), which may or may not be soluble, that comprises one 
30 or more of the amino acid sequences encoded by (a) the DNA sequences of SEQ ID 
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Nos.: 26-51, 138, 139, 163-183 and 201, (b) the complements of such DNA sequences 
or (c) DNA sequences substantially homologous to a sequence in (a) or (b). 

In the specific embodiments discussed above, the M. tuberculosis 
antigens include variants that are encoded by DNA sequences which are substantially 
5 homologous to one or more of DNA sequences specifically recited herein. "Substantial 
homology," as used herein, refers to DNA sequences that are capable of hybridizing 
under moderately stringent conditions. Suitable moderately stringent conditions include 
prewashing in a solution of 5X SSC, 0.5% SDS, 1.0 mM EDTA (pH 8.0); hybridizing 
at 50°C-65°C, 5X SSC, overnight or, in the case of cross-species homology at 45°C, 
10 0.5X SSC; followed by washing twice at 65°C for 20 minutes with each of 2X, 0.5X 
and 0.2X SSC containing 0.1% SDS). Such hybridizing DNA sequences are also 
within the scope of this invention, as are nucleotide sequences that, due to code 
degeneracy, encode an immunogenic polypeptide that is encoded by a hybridizing DNA 
sequence. 

15 In a related aspect, the present invention provides fusion proteins 

comprising a first and a second inventive polypeptide or, alternatively, a polypeptide of 
the present invention and a known M tuberculosis antigen, such as the 38 kD antigen 
described in Andersen and Hansen, Infect. Immun. 57:2481-2488, 1989, (Genbank 
Accession No. M30046) or ESAT-6 (SEQ ID Nos. 103 and 104), together with variants 

20 of such fusion proteins. The fusion proteins of the present invention may also include a 
linker peptide between the first and second polypeptides. 

A DNA sequence encoding a fusion protein of the present invention is 
constructed using known recombinant DNA techniques to assemble separate DNA 
sequences encoding the first and second polypeptides into an appropriate expression 

25 vector. The 3' end of a DNA sequence encoding the first polypeptide is ligated, with or 
without a peptide linker, to the 5' end of a DNA sequence encoding the second 
polypeptide so that the reading frames of the sequences are in phase to permit mRNA 
translation of the two DNA sequences into a single fusion protein that retains the 
biological activity of both the first and the second polypeptides. 
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A peptide linker sequence may be employed to separate the first and the 
second polypeptides by a distance sufficient to ensure that each polypeptide folds into 
its secondary and tertiary structures. Such a peptide linker sequence is incorporated into 
the fusion protein using standard techniques well known in the art. Suitable peptide 
5 linker sequences may be chosen based on the following factors: (1) their ability to 
adopt a flexible extended conformation; (2) their inability to adopt a secondary structure 
that could interact with functional epitopes on the first and second polypeptides; and 
(3) the lack of hydrophobic or charged residues that might react with the polypeptide 
functional epitopes. Preferred peptide linker sequences contain Gly, Asn and Ser 
10 residues. Other near neutral amino acids, such as Thr and Ala may also be used in the 
linker sequence. Amino acid sequences which may be usefully employed as linkers 
include those disclosed in Maratea etal., Gene 40:39-46, 1985; Murphy et al., Proc. 
Natl Acad, ScL USA 55:8258-8262, 1986; U.S. Patent No. 4,935,233 and U.S. Patent 
No. 4,751,180. The linker sequence may be from 1 to about 50 amino acids in length. 
15 Peptide sequences are not required when the first and second polypeptides have non- 
essential N-terminal amino acid regions that can be used to separate the functional 
domains and prevent steric interference. 

The ligated DNA sequences are operably linked to suitable 
transcriptional or translational regulatory elements. The regulatory elements 
20 responsible for expression of DNA are located only 5* to the DNA sequence encoding 
the first polypeptides. Similarly, stop codons require to end translation and 
transcription termination signals are only present 3* to the DNA sequence encoding the 
second polypeptide. 

In another aspect, the present invention provides methods for using one 
25 or more of the above polypeptides or fusion proteins (or DNA molecules encoding such 
polypeptides) to induce protective immunity against tuberculosis in a patient As used 
herein, a "patient" refers to any warm-blooded animal, preferably a human. A patient 
may be afflicted with a disease, or may be free of detectable disease and/or infection. In 
other words, protective immunity may be induced to prevent or treat tuberculosis. 



o o 

WO 98/16646 PCT/US97/18293 



26 



In this aspect, the polypeptide, fusion protein or DNA molecule is 
generally present within a pharmaceutical composition and/or a vaccine. 
Pharmaceutical compositions may comprise one or more polypeptides, each of which 
may contain one or more of the above sequences (or variants thereof), and a 
5 physiologically acceptable carrier. Vaccines may comprise one or more of the above 
polypeptides and a non-specific immune response enhancer, such as an adjuvant or a 
liposome (into which the polypeptide is incorporated). Such pharmaceutical 
compositions and vaccines may also contain other M tuberculosis antigens, either 
incorporated into a combination polypeptide or present within a separate polypeptide. 

10 Alternatively, a vaccine may contain DNA encoding one or more 

polypeptides as described above, such that the polypeptide is generated in situ. In such 
vaccines, the DNA may be present within any of a variety of delivery systems known to 
those of ordinary skill in the art, including nucleic acid expression systems, bacterial 
and viral expression systems. Appropriate nucleic acid expression systems contain the 

1 5 necessary DNA sequences for expression in the patient (such as a suitable promoter and 
terminating signal). Bacterial delivery systems involve the administration of a 
bacterium (such as Bacillus-Calmette-Guerriri) that expresses an immunogenic portion 
of the polypeptide on its cell surface. In a preferred embodiment, the DNA may be 
introduced using a viral expression system (e.g., vaccinia or other pox virus, retrovirus, 

20 or adenovirus), which may involve the use of a non-pathogenic (defective), replication 
competent virus. Techniques for incorporating DNA into such expression systems are 
well known to those of ordinary skill in the art. The DNA may also be "naked," as 
described, for example, in Ulmer et ah, Science 259:1745-1749, 1993 and reviewed by 
Cohen, Science 25P: 169 1-1 692, 1993. The uptake of naked DNA may be increased by 

25 coating the DNA onto biodegradable beads, which are efficiently transported, into the 
cells. 

In a related aspect, a DNA vaccine as described above may be 
administered simultaneously with or sequentially to either a polypeptide of the present 
invention or a known M. tuberculosis antigen, such as the 38 kD antigen described 
30 above. For example, administration of DNA encoding a polypeptide of the present 
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invention, either "naked" or in a delivery system as described above, may be followed 
by administration of an antigen in order to enhance the protective immune effect of the 
vaccine. 

Routes and frequency of administration, as well as dosage, will vary 
5 from individual to individual and may parallel those currently being used in 
immunization using BCG. In general, the pharmaceutical compositions and vaccines 
may be administered by injection (e.g., intracutaneous, intramuscular, intravenous or 
subcutaneous), intranasally (e.g., by aspiration) or orally. Between 1 and 3 doses may 
be administered for a 1-36 week period. Preferably, 3 doses are administered, at 
10 intervals of 3-4 months, and booster vaccinations may be given periodically thereafter. 
Alternate protocols may be appropriate for individual patients. A suitable dose is an 
amount of polypeptide or DNA that, when administered as described above, is capable 
of raising an immune response in an immunized patient sufficient to protect the patient 
from M. tuberculosis infection for at least 1-2 years. In general, the amount of 
15 polypeptide present in a dose (or produced in situ by the DNA in a dose) ranges from 
about 1 pg to about 100 mg per kg of host, typically from about 10 pg to about 1 mg, 
and preferably from about 100 pg to about 1 ug. Suitable dose sizes will vary with the 
size of the patient, but will typically range from about 0.1 mL to about 5 mL. 

While any suitable earner known to those of ordinary skill in the art may 
20 be employed in the pharmaceutical compositions of this invention, the type of carrier 
will vary depending on the mode of administration. For parenteral administration, such 
as subcutaneous injection, the carrier preferably comprises water, saline, alcohol, a fat, a 
wax or a buffer. For oral administration, any of the above carriers or a solid carrier, 
such as mannitol, lactose, starch, magnesium stearate, sodium saccharine, talcum, 
25 cellulose, glucose, sucrose, and magnesium carbonate, may be employed. 
"Biodegradable microspheres (e.g., polylactic galactide) may also be employed as 
carriers for the pharmaceutical compositions of this invention. Suitable biodegradable 
microspheres are disclosed, for example, in U.S. Patent Nos. 4,897,268 and 5,075,109. 

Any of a variety of adjuvants may be employed in the vaccines of this 
30 invention to nonspecifically enhance the immune response. Most adjuvants contain a 
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substance designed to protect the antigen from rapid catabolism, such as aluminum 
hydroxide or mineral oil, and a nonspecific stimulator of immune responses, such as 
lipid A, Bortadella pertussis or Mycobacterium tuberculosis. Suitable adjuvants are 
commercially available as, for example, Freund's Incomplete Adjuvant and Freund's 
5 Complete Adjuvant (Difco Laboratories) and Merck Adjuvant 65 (Merck and 
Company, Inc., Rahway, NJ). Other suitable adjuvants include alum, biodegradable 
microspheres, monophosphoryl lipid A and quil A. 

In another aspect, this invention provides methods for using one or more 
of the polypeptides described above to diagnose tuberculosis using a skin test. As used 

1 0 herein, a "skin test" is any assay performed directly on a patient in which a delayed-type 
hypersensitivity (DTH) reaction (such as swelling, reddening or dermatitis) is measured 
following intradermal injection of one or more polypeptides as described above. Such 
injection may be achieved using any suitable device sufficient to contact the 
polypeptide or polypeptides with dermal cells of the patient, such as a tuberculin 

15 syringe or 1 mL syringe. Preferably, the reaction is measured at least 48 hours after 
injection, more preferably 48-72 hours. 

The DTH reaction is a cell-mediated immune response, which is greater 
in patients that have been exposed previously to the test antigen {i.e., the immunogenic 
portion of the polypeptide employed, or a variant thereof). The response may be 
20 measured visually, using a ruler. In general, a response that is greater than about 0.5 cm 
in diameter, preferably greater than about 1 .0 cm in diameter, is a positive response, 
indicative of tuberculosis infection, which may or may not be manifested as an active 
disease. 

The polypeptides of this invention are preferably formulated, for use in a 
25 skin test, as pharmaceutical compositions containing a polypeptide and a 
physiologically acceptable carrier, as described above. Such compositions typically 
contain one or more of the above polypeptides in an amount ranging from about 1 ^g to 
about 100 (ig, preferably from about 10 jj.g to about 50 \xg in a volume of 0.1 mL. 
Preferably, the carrier employed in such pharmaceutical compositions is a saline 
30 solution with appropriate preservatives, such as phenol and/or Tween 80™. 
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In a preferred embodiment, a polypeptide employed in a skin test is of 
sufficient size such that it remains at the site of injection for the duration of the reaction 
period. In general, a polypeptide that is at least 9 amino acids in length is sufficient. 
The polypeptide is also preferably broken down by macrophages within hours of 
5 injection to allow presentation to T-cells. Such polypeptides may contain repeats of one 
or more of the above sequences and/or other immunogenic or nonimmunogenic 
sequences. 

The following Examples are offered by way of illustration and not by 
1 0 way of limitation. 

EXAMPLES 
EXAMPLE 1 

15 Purification and Characterization of Polypeptides 

from m. tuberculosis culture filtrate 

This example illustrates the preparation of M. tuberculosis soluble 
polypeptides from culture filtrate. Unless otherwise noted, all percentages in the 
20 following example are weight per volume. 

M. tuberculosis (either H37Ra, ATCC No. 25177, or H37Rv, ATCC 
No. 25618) was cultured in sterile GAS media at 37°C for fourteen days. The media 
was then vacuum filtered (leaving the bulk of the cells) through a 0.45 \x filter into a 
sterile 2.5 L bottle. The media was next filtered through a 0.2 \i filter into a sterile 4 L 
25 bottle and NaN 3 was added to the culture filtrate to a concentration of 0.04%. The 
bottles were then placed in a 4°C cold room. 

The culture filtrate was concentrated by placing the filtrate in a 12 L 
reservoir that had been autoclaved and feeding the filtrate into a 400 ml Amicon stir cell 
which had been rinsed with ethanol and contained a 10,000 kDa MWCO membrane. 
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The pressure was maintained at 60 psi using nitrogen gas. This procedure reduced the 
12 L volume to approximately 50 ml. 

The culture filtrate was dialyzed into 0.1% ammonium bicarbonate using 
a 8,000 kDa MWCO cellulose ester membrane, with two changes of ammonium 
5 bicarbonate solution. Protein concentration was then determined by a commercially 
available BCA assay (Pierce, Rockford, IL). 

The dialyzed culture filtrate was then lyophilized, and the polypeptides 
resuspended in distilled water. The polypeptides were dialyzed against 0.01 mM 1,3 
bis[tris(hydroxymethyl)-methylamino]propane, pH 7.5 (Bis-Tris propane buffer), the 

10 initial conditions for anion exchange chromatography. Fractionation was performed 
using gel profusion chromatography on a POROS 146 II Q/M anion exchange column 
4.6 mm x 100 mm (Perseptive BioSystems, Framingham, MA) equilibrated in 0.01 mM 
Bis-Tris propane buffer pH 7.5. Polypeptides were eluted with a linear 0-0.5 M NaCl 
gradient in the above buffer system. The column eluent was monitored at a wavelength 

15 of220nm. 

The pools of polypeptides eluting from the ion exchange column were 
dialyzed against distilled water and lyophilized. The resulting material was dissolved in 
0.1% trifluoroacetic acid (TFA) pH 1.9 in water, and the polypeptides were purified on 
a Delta-Pak CI 8 column (Waters, Milford, MA) 300 Angstrom pore size, 5 micron 

20 particle size (3.9 x 150 mm). The polypeptides were eluted from the column with a 
linear gradient from 0-60% dilution buffer (0.1% TFA in acetonitrile). The flow rate 
was 0.75 ml/minute and the HPLC eluent was monitored at 214 nm. Fractions 
containing the eluted polypeptides were collected to maximize the purity of the 
individual samples. Approximately 200 purified polypeptides were obtained. 

25 The purified polypeptides were then screened for the ability to induce T- 

cell proliferation in PBMC preparations. The PBMCs from donors known to be PPD 
skin test positive and whose T-cells were shown to proliferate in response to PPD and 
crude soluble proteins from MTB were cultured in medium comprising RPMI 1640 
supplemented with 10% pooled human serum and 50 p.g/ml gentamicin. Purified 

30 polypeptides were added in duplicate at concentrations of 0.5 to 10 |ag/mL. After six 
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days of culture in 96-well round-bottom plates in a volume of 200 50 jil of medium 
was removed from each well for determination of IFN-y levels, as described below. 
The plates were then pulsed with 1 faCi/well of tritiated thymidine for a further 18 
hours, harvested and tritium uptake determined using a gas scintillation counter. 
5 Fractions that resulted in proliferation in both replicates three fold greater than the 
proliferation observed in cells cultured in medium alone were considered positive. 

IFN-y was measured using an enzyme-linked immunosorbent assay 
(ELISA). ELISA plates were coated with a mouse monoclonal antibody directed to 
human IFN-y (PharMingen, San Diego, CA) in PBS for four hours at room temperature. 

10 Wells were then blocked with PBS containing 5% (W/V) non-fat dried milk for 1 hour 
at room temperature. The plates were then washed six times in PBS/0.2% TWEEN-20 
and samples diluted 1:2 in culture medium in the ELISA plates were incubated 
overnight at room temperature. The plates were again washed and a polyclonal rabbit 
anti-human IFN-y serum diluted 1:3000 in PBS/10% normal goat serum was added to 

15 each well. The plates were then incubated for two hours at room temperature, washed 
and horseradish peroxidase-coupled anti-rabbit IgG (Sigma Chemical So., St. Louis, 
MO) was added at a 1 :2000 dilution in PBS/5% non-fat dried milk. After a further two 
hour incubation at room temperature, the plates were washed and TMB substrate added. 
The reaction was stopped after 20 min with 1 N sulfuric acid. Optical density was 

20 determined at 450 nra using 570 nm as a reference wavelength. Fractions that resulted 
in both replicates giving an OD two fold greater than the mean OD from cells cultured 
in medium alone, plus 3 standard deviations, were considered positive. 

For sequencing, the polypeptides were individually dried onto 
Biobrene™ (Perkin Elmer/Applied BioSystems Division, Foster City, CA) treated glass 

25 fiber filters. The filters with polypeptide were loaded onto a Perkin Elmer/Applied 
BioSystems Division Precise 492 protein sequencer. The polypeptides were sequenced 
from the amino terminal and using traditional Edman chemistry. The amino acid 
sequence was determined for each polypeptide by comparing the retention time of the 
PTH amino acid derivative to the appropriate PTH derivative standards. 
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Using the procedure described above, antigens having the following 
N-terminal sequences were isolated: 

(a) Asp-Pro- Val-Asp-Ala-Val-Ile-Asn-Thr-Thr-Xaa-Asn-Tyr-Gly- 
Gln-Val-Val-Ala-Ala-Leu; (SEQ ID No. 54) 

5 (b) Ala-Val-Glu-Ser-Gly-Met-Leu-Ala-Leu-Gly-Thr-Pro-Ala-Pro- 

Ser; (SEQ ID No. 55) 

(c) Ala-Ala-Met-Lys-Pro-Arg-Thr-Gly-Asp-Gly-Pro-Leu-Glu-Ala- 
Ala-Lys-GIu-Gly-Arg; (SEQ ID No. 56) 

(d) Tyr-Tyr-Trp-Cys-Pro-Gly-Gln-Pro-Phe-Asp-Pro-Ala-Trp-Gly- 
0 Pro; (SEQ ID No. 57) 

(e) Asp-Ile-Gly-Ser-Glu-Ser-Thr-Glu-Asp-Gln-Gln-Xaa-Ala-Val; 
(SEQ ID No. 58) 

(f) Ala-Glu-Glu-Ser-Ile-Ser-Thr-Xaa-Glu-Xaa-Ile-Val-Pro; (SEQ ID 
No. 59) 

5 (g) Asp-Pro-Glu-Pro-Ala-Pro-Pro-Val-Pro-Thr-Ala-Ala-Ala-Ala- 

Pro-Pro-Ala; (SEQ ID No. 60) and 

(h) Ala-Pro-Lys-Thr-Tyr-Xaa-Glu-Glu-Leu-Lys-Gly-Thr-Asp-Thr- 
Gly; (SEQ ID No. 61) 
wherein Xaa may be any amino acid. 

An additional antigen = was isolated employing a microbore HPLC 
purification step in addition to the procedure described above. Specifically, 20 ul of a 
fraction comprising a mixture of antigens from the chromatographic purification step 
previously described, was purified on an Aquapore CI 8 column (Perkin Elmer/ Applied 
Biosy stems Division, Foster City, CA) with a 7 micron pore size, column size 1 mm x 
5 100 mm, in a Perkin Elmer/ Applied Biosystems Division Model 172 HPLC. Fractions 
were eluted from the column with a linear gradient of 1%/minute of acetonitrile 
(containing 0.05% TFA) in water (0.05% TFA) at a flow rate of 80 ul/minute. The 
eluent was monitored at 250 nm. The original fraction was separated into 4 major peaks 
plus other smaller components and a polypeptide was obtained which was shown to 
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have a molecular weight of 12.054 Kd (by mass spectrometry) and the following N- 
terminal sequence: 

(i) Asp-Pro-Ala-Ser-Ala-Pro-Asp-Val-Pro-Thr-Ala-Ala-Gln-Gln- 

Thr-Ser-Leu-Leu-Asn-Asn-Leu-Ala-Asp-Pro-Asp-Val-Ser-Phe- 
5 Ala-Asp (SEQ ID No. 62). 

This polypeptide was shown to induce proliferation and IFN-y production in PBMC 
preparations using the assays described above. 

Additional soluble antigens were isolated from M. tuberculosis culture 
filtrate as follows. M. tuberculosis culture filtrate was prepared as described above. 
10 Following dialysis against Bis-Tris propane buffer, at pH 5.5, fractionation was 
performed using anion exchange chromatography on a Poros QE column 4.6 x 100 mm 
(Perseptive Biosystems) equilibrated in Bis-Tris propane buffer pH 5.5. Polypeptides 
were eluted with a linear 0-1.5 M NaCl gradient in the above buffer system at a flow 
rate of 10 ml/min. The column eluent was monitored at a wavelength of 214 nm. 
* 5 The fractions eluting from the ion exchange column were pooled and 

subjected to reverse phase chromatography using a Poros R2 column 4.6 x 100 mm 
(Perseptive Biosystems). Polypeptides were eluted from the column with a linear 
gradient from 0-100% acetonitrile (0.1% TFA) at a flow rate of 5 ml/min. The eluent 
was monitored at 214 nm. 

20 Fractions containing the eluted polypeptides were lyophilized and 

resuspended in 80 jli! of aqueous 0.1% TFA and further subjected to reverse phase 
chromatography on a Vydac C4 column 4.6 x 150 mm (Western Analytical, Temecula, 
CA) with a linear gradient of 0-100% acetonitrile (0.1% TFA) at a flow rate of 2 
ml/min. Eluent was monitored at 214 nm. 

25 The fraction with biological activity was separated into one major peak 

plus other smaller components. Western blot of this peak onto PVDF membrane 
revealed three major bands of molecular weights 14 Kd, 20 Kd and 26 Kd. These 
polypeptides were determined to have the following N-terminal sequences, respectively: 
0) Xaa-Asp-Ser-Glu-Lys-Ser-Ala-Thr-Ile-Lys-Val-Thr-Asp-Ala- 

30 Ser; (SEQ ID No. 134) 
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(k) Ala-GIy-Asp-Thr-Xaa-Ile-Tyr-Ile-Val-Gly-Asn-Leu-Thr-Ala- 
Asp; (SEQ ID No. 135) and 

(1) Ala-Pro-Glu-Ser-Gly-Ala-Gly-Leu-Gly-Gly-Thr-Val-Gln-Ala- 
Gly; (SEQ ID No. 136), wherein Xaa may be any amino acid. 
5 Using the assays described above, these polypeptides were shown to induce 
proliferation and IFN-y production in PBMC preparations. Figs. 1A and B show the 
results of such assays using PBMC preparations from a first and a second donor, 
respectively. 

DNA sequences that encode the antigens designated as (a), (c), (d) and 

10 (g) above were obtained by screening a genomic M tuberculosis library using 32 P end 
labeled degenerate oligonucleotides corresponding to the N-terminal sequence and 
containing M. tuberculosis codon bias. The screen performed using a probe 
corresponding to antigen (a) above identified a clone having the sequence provided in 
SEQ ID No. 101. The polypeptide encoded by SEQ ID No. 101 is provided in SEQ ID 

15 No. 102. The screen performed using a probe corresponding to antigen (g) above 
identified a clone having the sequence provided in SEQ ID No. 52. The polypeptide 
encoded by SEQ ID No. 52 is provided in SEQ ID No. 53. The screen performed 
using a probe corresponding to antigen (d) above identified a clone having the sequence 
provided in SEQ ID No. 24, and the screen performed with a probe corresponding to 

20 antigen (c) identified a clone having the sequence provided in SEQ ID No: 25. 

The above amino acid sequences were compared to known amino acid 
sequences in the gene bank using the DNA STAR system. The database searched 
contains some 173,000 proteins and is a combination of the Swiss, PIR databases along 
with translated protein sequences (Version 87). No significant homologies to the amino 

25 acid sequences for antigens (a)-(h) and (1) were detected. 

The amino acid sequence for antigen (i) was found to be homologous to 
a sequence from M. leprae. The full length M leprae sequence was amplified from 
genomic DNA using the sequence obtained from GENBANK. This sequence was then 
used to screen the M. tuberculosis library described below in Example 2 and a full 

30 length copy of the M. tuberculosis homologue was obtained (SEQ ID No. 99). 
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The amino acid sequence for antigen (j) was found to be homologous to 
a known M. tuberculosis protein translated from a DNA sequence. To the best of the 
inventors' knowledge, this protein has not been previously shown to possess T-cell 
stimulatory activity. The amino acid sequence for antigen (k) was found to be related to 
5 a sequence from M. leprae. 

In the proliferation and IFN-y assays described above, using three PPD 
positive donors, the results for representative antigens provided above are presented in 
Table 1: 

10 TABLE 1 

Results of PBMC Proliferation and IFN-y Assays 



Sequence 


Proliferation 


IFN-y 


(a) 


+ 




(c) 


+++ 


+++ 


(d) 


++ 


++ 


(g) 


+++ 


+++ 


(h) 


+++ 


+++ 



In Table 1, responses that gave a stimulation index (SI) of between 2 and 
1 5 4 (compared to cells cultured in medium alone) were scored as +, an SI of 4-8 or 2-4 at 
a concentration of 1 jig or less was scored as ++ and an SI of greater than 8 was scored 
as +++. The antigen of sequence (i) was found to have a high SI (+++) for one donor 
and lower SI (++ and +) for the two other donors in both proliferation and IFN-y assays. 
These results indicate that these antigens are capable of inducing proliferation and/or 
20 interferon-y production. 
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EXAMPLE 2 

Use of Patient Sera to Isolate M. Tuberculosa Antigens 

This example illustrates the isolation of antigens from M. tuberculosis 
5 lysate by screening with serum from M. tuberculosis-infected individuals. 

Dessicated M. tuberculosis H37Ra (Difco Laboratories) was added to a 
2% NP40 solution, and alternately homogenized and sonicated three times. The 
resulting suspension was centrifuged at 13,000 rpm in microfuge tubes and the 
supernatant put through a 0.2 micron syringe filter. The filtrate was bound to Macro 
10 Prep DEAE beads (BioRad, Hercules, CA). The beads were extensively washed with 
20 mM Tris pH 7.5 and bound proteins eluted with 1M NaCl. The 1M NaCl elute was 
dialyzed overnight against 10 mM Tris, pH 7.5. Dialyzed solution was treated with 
DNase and RNase at 0.05 mg/ml for 30 min. at room temperature and then with a-D- 
mannosidase, 0.5 U/mg at pH 4.5 for 3-4 hours at room temperature. After returning to 
15 pH 7.5, the material was fractionated via FPLC over a Bio Scale-Q-20 column 
(BioRad). Fractions were combined into nine pools, concentrated in a Centriprep 10 
(Amicon, Beverley, MA) and then screened by Western blot for serological activity 
using a serum pool from M. tuberculosis-infected patients which was not 
immunoreactive with other antigens of the present invention. 
20 The most reactive fraction was run in SDS-PAGE and transferred to 

PVDF. A band at approximately 85 Kd was cut out yielding the sequence: 

(m) Xaa-Tyr-Ile-Ala-Tyr-Xaa-Thr-Thr-Ala-Gly-Ile-Val-Pro-Gly-Lys- 
Ile-Asn-Val-His-Leu-Val; (SEQ ID No. 137), wherein Xaa may 
be any amino acid. 

25 Comparison of this sequence with those in the gene bank as described 

above, revealed no significant homologies to known sequences. 

A DNA sequence that encodes the antigen designated as (m) above was 
obtained by screening a genomic M tuberculosis Erdman strain library using labeled 
degenerate oligonucleotides corresponding to the N-terminal sequence of SEQ ID 

30 NO: 137. A clone was identified having the DNA sequence provided in SEQ ID NO: 
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203. This sequence was found to encode the amino acid sequence provided in SEQ ID 
NO: 204. Comparison of these sequences with those in the genebank revealed some 
similarity to sequences previously identified in M. tuberculosis and M. bovis. 

5 EXAMPLE 3 

Preparation of DNA Sequences Encodinc, m. tuberculosis Antic.fnk 

This example illustrates the preparation of DNA sequences encoding 
M. tuberculosis antigens by screening a M. tuberculosis expression library with sera 
10 obtained from patients infected with M. tuberculosis, or with anti-sera raised against 
soluble M. tuberculosis antigens. 

A: Preparation of M. tuberculosis S oluble Antigens using Rabbit Anti- 
sera RAISED AGAINST M. TUBERCULOSIS SUPERNATANT 

15 Genomic DNA was isolated from the M. tuberculosis strain H37Ra. The 

DNA was randomly sheared and used to construct an expression library using the 
Lambda ZAP expression system (Stratagene, La Jolla, CA). Rabbit anti-sera was 
generated against secretory proteins of the M. tuberculosis strains H37Ra, H37Rv and 
Erdman by immunizing a rabbit with concentrated supernatant of the M. tuberculosis 

20 cultures. Specifically, the rabbit was first immunized subcutaneously with 200 ng of 
protein antigen in a total volume of 2 ml containing 10 ng muramyl dipeptide 
(Calbiochem, La Jolla, CA) and 1 ml of incomplete Freund's adjuvant. Four weeks later 
the rabbit was boosted subcutaneously with 100 yg antigen in incomplete Freund's 
adjuvant. Finally, the rabbit was immunized intravenously four weeks later with 50 u.g 
25 protein antigen. The anti-sera were used to screen the expression library as described in 
Sambrook etal., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor 
Laboratories, Cold Spring Harbor, NY, 1989. Bacteriophage plaques expressing 
immunoreactive antigens were purified. Phagemid from the plaques was rescued and 
the nucleotide sequences of the M. tuberculosis clones deduced. 
30 Thirty two clones were purified. Of these, 25 represent sequences that 

have not been previously identified in human M. tuberculosis. Recombinant antigens 
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were expressed and purified antigens used in the immunological analysis described in 
Example 1. Proteins were induced by IPTG and purified by gel elution, as described in 
Skeiky etal., J. Exp Med. 757:1527-1537, 1995. Representative sequences of DNA 
molecules identified in this screen are provided in SEQ ID Nos.: 1-25. The 
5 corresponding predicted amino acid sequences are shown in SEQ ID Nos. 63-87. 

On comparison of these sequences with known sequences in the gene 
bank using the databases described above, it was found that the clones referred to 
hereinafter as TbRA2A, TbRA16, TbRA18, and TbRA29 (SEQ ID Nos. 76, 68, 70, 75) 
show some homology to sequences previously identified in Mycobacterium leprae but 

10 not in M. tuberculosis. TbRAl 1, TbRA26, TbRA28 and TbDPEP (SEQ ID Nos.: 65, 
73, 74, 53) have been previously identified in M. tuberculosis. No significant 
homologies were found to TbRAl, TbRA3, TbRA4, TbRA9, TbRAlO, TbRA13, 
TbRAl 7, TbRal9, TbRA29, TbRA32, TbRA36 and the overlapping clones TbRA35 
and TbRAl 2 (SEQ ID Nos. 63, 77, 81, 82, 64, 67, 69, 71, 75, 78, 80, 79, 66). The 

1 5 clone TbRa24 is overlapping with clone TbRa29. 

The results of PBMC proliferation and interferon-y assays performed on 
representative recombinant antigens, and using T-cell preparations from several 
different M. tuberculosis-immune patients, are presented in Tables 2 and 3, 
respectively. 
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In Tables 2 and 3, responses that gave a stimulation index (SI) of 
between 1 .2 and 2 (compared to cells cultured in medium alone) were scored as ±, a SI 
of 2-4 was scored as +, as SI of 4-8 or 2-4 at a concentration of 1 jag or less was scored 
as ++ and an SI of greater than 8 was scored as +++. In addition, the effect of 
5 concentration on proliferation and interferon-y production is shown for two of the above 
antigens in the attached Figure. For both proliferation and interferon-y production, 
TbRa3 was scored as ++ and TbRa9 as +. 

These results indicate that these soluble antigens can induce proliferation 
and/or interferon-y production in T-cells derived from an Af. tuberculosis-immune 
10 individual. 



B. Use of Sera From Patients having Pulmonary or Pleural Tuberculosis 
to Identify DNA Sequences Encoding M. tuberculosis Antigens 

The genomic DNA library described above, and an additional H37Rv 

15 library, were screened using pools of sera obtained from patients with active 

.tuberculosis. To prepare the H37Rv library, M tuberculosis strain H37Rv genomic 

DNA was isolated, subjected to partial Sau3A digestion and used to construct an 

expression library using the Lambda Zap expression system (Stratagene, La Jolla, Ca). 

Three different pools of sera, each containing sera obtained from three individuals with 

20 active pulmonary or pleural disease, were used in the expression screening. The pools 
were designated TbL, TbM and TbH, referring to relative reactivity with H37Ra lysate 
(Le. 9 TbL = low reactivity, TbM = medium reactivity and TbH = high reactivity) in both 
ELISA and immunoblot format. A fourth pool of sera from seven patients with active 
pulmonary tuberculosis was also employed. All of the sera lacked increased reactivity 

25 with the recombinant 38 kD M tuberculosis H37Ra phosphate-binding protein. 

All pools were pre-adsorbed with E. coli lysate and used to screen the 
H37Ra and H37Rv expression libraries, as described in Sambrook etal., Molecular 
Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, Cold Spring Harbor, 
NY, 1989. Bacteriophage plaques expressing immunoreactive antigens were purified. 

30 Phagemid from the plaques was rescued and the nucleotide sequences of the 
M. tuberculosis clones deduced. 
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Thirty two clones were purified. Of these, 3 1 represented sequences that 
had not been previously identified in human M. tuberculosis. Representative sequences 
of the DNA molecules identified are provided in SEQ ID Nos.: 26-51 and 105. Of 
these, TbH-8-2 (SEQ. ID NO. 105) is a partial clone of TbH-8, and TbH-4 (SEQ. ID 
5 NO. 43) and TbH-4-FWD (SEQ. ID NO. 44) are non-contiguous sequences from the 
same clone. Amino acid sequences for the antigens hereinafter identified as Tb38-1, 
TbH-4, TbH-8, TbH-9, and TbH-12 are shown in SEQ ID Nos.: 88-92. Comparison of 
these sequences with known sequences in the gene bank using the databases identified 
above revealed no significant homologies to TbH-4, TbH-8, TbH-9 and TbM-3, 

1 0 although weak homologies were found to TbH-9. TbH- 1 2 was found to be homologous 
to a 34 kD antigenic protein previously identified in M. paratuberculosis (Acc. 
No. S28515). Tb38-1 was found to be located 34 base pairs upstream of the open 
reading frame for the antigen ESAT-6 previously identified in M. bovis (Acc. 
No. U34848) and in M. tuberculosis (Sorensen et al., Infec. Immun. 55:1710-1717, 

15 1995). 

Probes derived from Tb38-1 and TbH-9, both isolated from an H37Ra 
library, were used to identify clones in an H37Rv library. Tb38-1 hybridized to 
Tb38-lF2,Tb38-lF3,Tb38-lF5andTb38-lF6 (SEQ. ID NOS. 112, 113, 116, 118, and 
119). (SEQ ID NOS. 112 and 113 are non-contiguous sequences from clone Tb38- 

20 1F2.) Two open reading frames were deduced in Tb38-IF2; one corresponds to Tb37FL 
(SEQ. ID. NO. 1 14), the second, a partial sequence, may be the homologue of Tb38-1 
and is called Tb38-IN (SEQ. ID NO. 115). The deduced amino acid sequence of Tb38- 
1F3 is presented in SEQ. ID. NO. 117. A TbH-9 probe identified three clones in the 
H37Rv library: TbH-9-FL (SEQ. ID NO. 106), which may be the homologue of TbH-9 

25 (R37Ra), TbH-9-I (SEQ. ID NO. 108), and TbH-9-4 (SEQ. ID NO. 1 10), all of which 
are highly related sequences to TbH-9. The deduced amino acid sequences for these 
three clones are presented in SEQ ID NOS. 107, 109 and 1 1 1. 

Further screening of the M. tuberculosis genomic DNA library, as 
described above, resulted in the recovery of ten additional reactive clones, representing 

30 seven different genes. One of these genes was identified as the 38 Kd antigen discussed 
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above, one was determined to be identical to the 14Kd alpha crystallin heat shock 
protein previously shown to be present in M. tuberculosis, and a third was determined 
to be identical to the antigen TbH-8 described above. The determined DNA sequences 
for the remaining five clones (hereinafter referred to as TbH-29, TbH-30, TbH-32 and 
5 TbH-33) are provided in SEQ ID NO: 138-141, respectively, with the corresponding 
predicted amino acid sequences being provided in SEQ ID NO: 142-145, respectively. 
The DNA and amino acid sequences for these antigens were compared with those in the 
gene bank as described above. No homologies were found to the 5' end of TbH-29 
(which contains the reactive open reading frame), although the 3' end of TbH-29 was 
10 found to be identical to the M. tuberculosis cosmid Y227. TbH-32 and TbH-33 were 
found to be identical to the previously identified M. tuberculosis insertion element 
IS61 10 and to the M. tuberculosis cosmid Y50, respectively. No significant homologies 
to TbH-30 were found. 

Positive phagemid from this additional screening were used to infect E. 

1 5 coli XL- 1 Blue MRF, as described in Sambrook et al., supra. Induction of recombinant 
protein was accomplished by the addition of IPTG. Induced and uninduced lysates 
were run in duplicate on SDS-PAGE and transferred to nitrocellulose filters. Filters 
were reacted with human M. tuberculosis sera (1:200 dilution) reactive with TbH and a 
rabbit sera (1:200 or 1:250 dilution) reactive with the N-terminal 4 Kd portion of lacZ. 

20 Sera incubations were performed for 2 hours at room temperature. Bound antibody was 
detected by addition of ,25 I-labeled Protein A and subsequent exposure to film for 
variable times ranging from 16 hours to 11 days. The results of the immunoblots are 
summarized in Table 4. 
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TABLE 4 



10 



Human M. tb Anti-lacZ 
Antigen Sera Sera 

TbH-29 45 Kd 45 Kd 

TbH-30 No reactivity 29 Kd 

TbH-32 12 Kd 12 Kd 

TbH-33 16Kd 16Kd 



Positive reaction of the recombinant human M tuberculosis antigens 
with both the human M tuberculosis sera and anti-lacZ sera indicate that reactivity of 
the human M. tuberculosis sera is directed towards the fusion protein. Antigens 
reactive with the anti-lacZ sera but not with the human M. tuberculosis sera may be the 
1 5 result of the human M. tuberculosis sera recognizing conformational epitopes, or the 
antigen-antibody binding kinetics may be such that the 2 hour sera exposure in the 
immunoblot is not sufficient. 



The results of T-cell assays performed on Tb38-1, ESAT-6 and other 
20 representative recombinant antigens are presented in Tables 5A„ B and 6, respectively, 
below: 



TABLE 5A 

Results of PBMC Proliferation to Representative Antigens 

25 
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TABLE 5B 

Results of PBMC Interferon-v Production to Represent ativr Antigens 
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TABLE 6 

Summary of T-cell Responses to Representative Antigens 
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These results indicate that both the inventive M. tuberculosis antigens 
and ESAT-6 can induce proliferation and/or interferon-y production in T-cells derived 
from an M. tuberculosis-immune individual. To the best of the inventors 1 knowledge, 
ISSAT-6 has not been previously shown to stimulate human immune responses 

A set of six overlapping peptides covering the amino acid sequence of 
the antigen Tb38-1 was constructed using the method described in Example 6. The 
sequences of these peptides, hereinafter referred to as pep 1-6, are provided in SEQ ID 
Nos. 93-98, respectively. The results of T-cell assays using these peptides are shown in 
Tables 7 and 8. These results confirm the existence, and help to localize T-cell epitopes 
within Tb38-1 capable of inducing proliferation and interferon-y production in T-cells 
derived from an M. tuberculosis immune individual. 
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Studies were undertaken to determine whether the antigens TbH-9 and Tb38-1 
represent cellular proteins or are secreted into M. tuberculosis culture media. In the first 
study, rabbit sera were raised against A) secretory proteins of M. tuberculosis, B) the known 
secretory recombinant M tuberculosis antigen 85b, C) recombinant Tb38-1 and D) 
5 recombinant TbH-9, using protocols substantially the same as that as described in Example 
3 A. Total M tuberculosis lysate, concentrated supernatant of M. tuberculosis cultures and 
the recombinant antigens 85b, TbH-9 and Tb38-1 were resolved on denaturing gels, 
immobilized on nitrocellulose membranes and duplicate blots were probed using the rabbit 
sera described above. 

1° The results of this analysis using control sera (panel I) and antisera (panel II) 

against secretory proteins, recombinant 85b, recombinant Tb38-1 and recombinant TbH-9 are 
shown in Figures 3A-D, respectively, wherein the lane designations are as follows: 1) 
molecular weight protein standards; 2) 5 jxg of M. tuberculosis lysate; 3) 5 \x% secretory 
proteins; 4) 50 ng recombinant Tb38-1; 5) 50 ng recombinant TbH-9; and 6) 50 ng 

15 recombinant 85b. The recombinant antigens were engineered with six terminal histidine 
residues and would therefore be expected to migrate with a mobility approximately 1 kD 
larger that the native protein. In Figure 3D, recombinant TbH-9 is lacking approximately 10 
kD of the full-length 42 kD antigen, hence the significant difference in the size of the 
immunoreactive native TbH-9 antigen in the lysate lane (indicated by an arrow). These 

20 results demonstrate that Tb38-1 and TbH-9 are intracellular antigens and are not actively 
secreted by M. tuberculosis. 

The finding that TbH-9 is an intracellular antigen was confirmed by 
determining the reactivity of TbH-9-specific human T cell clones to recombinant TbH-9, 
secretory M. tuberculosis proteins and PPD. A TbH-9-specific T cell clone (designated 

25 131TbH-9) was generated from PBMC of a healthy PPD-positive donor. The proliferative 
response of I31TbH-9 to secretory proteins, recombinant TbH-9 and a control M. 
tuberculosis antigen, TbRal 1, was determined by measuring uptake of tritiated thymidine, as 
described in Example 1. As shown in Figure 4A, the clone 131TbH-9 responds specifically 
to TbH-9, showing that TbH-9 is not a significant component of M. tuberculosis secretory 

30 proteins. Figure 4B shows the production of IFN-y by a second TbH-9-specific T cell clone 
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(designated PPD 800-10) prepared from PBMC from a healthy PPD-positive donor, 
following stimulation of the T cell clone with secretory proteins, PPD or recombinant TbH-9. 
These results further confirm that TbH-9 is not secreted by M. tuberculosis. 

C. Use of Sera From Patients having Extrapulmonary Tuberci ji osis to Identtfv 
DNA Sequences Encoding M tuberculosis Antigens 

Genomic DNA was isolated from M. tuberculosis Erdman strain, randomly 
sheared and used to construct an expression library employing the Lambda ZAP expression 
system (Stratagene, La Jolla, CA). The resulting library was screened using pools of sera 
obtained from individuals with extrapulmonary tuberculosis, as described above in Example 
3B, with the secondary antibody being goat anti-human IgG + A + M (H+L) conjugated with 
alkaline phosphatase. 

Eighteen clones were purified. Of these, 4 clones (hereinafter referred to as 
XP14, XP24, XP31 and XP32) were found to bear some similarity to known sequences. The 
determined DNA sequences for XP14, XP24 and XP31 are provided in SEQ ID Nos.: 156- 
158, respectively, with the 5' and 3' DNA sequences for XP32 being provided in SEQ ID 
Nos.: 159 and 160, respectively. The predicted amino acid sequence for XP14 is provided in 
SEQ ID No: 161. The reverse complement of XP14 was found to encode the amino acid 
sequence provided in SEQ ID No.: 162. 

Comparison of the sequences for the remaining 14 clones (hereinafter referred 
to as XP1-XP6, XP17-XP19, XP22, XP25, XP27, XP30 and XP36) with those in the 
genebank as described above, revealed no homologies with the exception of the 3' ends of 
XP2 and XP6 which were found to bear some homology to known M. tuberculosis cosmids. 
The DNA sequences for XP27 and XP36 are shown in SEQ ID Nos.: 163 and 164, 
respectively, with the 5' sequences for XP4, XP5, XP17 and XP30 being shown in SEQ ID 
Nos: 165-168, respectively, and the 5' and 3' sequences for XP2, XP3, XP6, XP18, XP19, 
XP22 and XP25 being shown in SEQ ID Nos: 169 and 170; 171 and 172; 173 and 174; 175 
and 176; 177 and 178; 179 and 180; and 181 and 182, respectively. XP1 was found to 
overlap with the DNA sequences for TbH4, disclosed above. The full-length DNA sequence 
for TbH4-XPl is provided in SEQ ID No.: 183. This DNA sequence was found to contain an 
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open reading frame encoding the amino acid sequence shown in SEQ ID No: 184. The 
reverse complement of TbH4-XPl was found to contain an open reading frame encoding the 
amino acid sequence shown in SEQ ID No.: 185. The DNA sequence for XP36 was found to 
contain two open reading frames encoding the amino acid sequence shown in SEQ ID Nos.: 
5 186 and 187, with the reverse complement containing an open reading frame encoding the 
amino acid sequence shown in SEQ ID No.: 188. 

Recombinant XP1 protein was prepared as described above in Example 3B, 
with a metal ion affinity chromatography column being employed for purification. As 
illustrated in Figures 8A-B and 9A-B, using the assays described herein, recombinant XP1 
10 was found to stimulate cell proliferation and IFN-y production in T cells isolated from an M 
tuberculosis-immune donors. 



D. Preparation of M. tuberculosis Soluble Antigens using Rabbit Anti-sera 

RAISED AGAINST M. TUBERCULOSIS FRACTIONATED PROTEINS 

15 M tuberculosis lysate was prepared as described above in Example 2. The 

resulting material was fractionated by HPLC and the fractions screened by Western blot for 
serological activity with a serum pool from M. tuberculosis-infected patients which showed 
little or no immunoreactivity with other antigens of the present invention. Rabbit anti-sera 
was generated against the most reactive fraction using the method described in Example 3A . 

20 The anti-sera was used to screen an A£ tuberculosis Erdman strain genomic DNA expression 
library prepared as described above. Bacteriophage plaques expressing immunoreactive 
antigens were purified. Phagemid from the plaques was rescued and the nucleotide sequences 
of the M. tuberculosis clones determined. 

: Ten different clones were purified. Of these, one was found to be TbRa35, 

25 described above, and one was found to be the previously identified M. tuberculosis antigen, 
HSP60. Of the remaining eight clones, seven (hereinafter referred to as RDIF2, RDIF5, 
RDIF8, RDIF10, RDIF11 and RDIF 12) were found to bear some similarity to previously 
identified M. tuberculosis sequences. The determined DNA sequences for RDIF2, RDIF5, 
RDIF8, RDIF10 and RDIF11 are provided in SEQ ID Nos.: 189-193, respectively, with the 

30 corresponding predicted amino acid sequences being provided in SEQ ID Nos: 194-198, 
respectively. The 5' and 3' DNA sequences for RDIF 12 are provided in SEQ ID Nos.: 199 
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and 200, respectively. No significant homologies were found to the antigen RDIF-7. The 
determined DNA and predicted amino acid sequences for RDIF7 are provided in SEQ ID 
Nos.: 201 and 202, respectively. One additional clone, referred to as RDIF6 was isolated, 
however, this was found to be identical to RDIF5. 

Recombinant RDIF6, RDIF8, RDIF10 and RDIF11 were prepared as 
described above. As shown in Figures 8A-B and 9A-B, these antigens were found to 
stimulate cell proliferation and IFN-y production in T cells isolated from M. tuberculosis- 
immune donors. 

EXAMPLE 4 

Purification an d Characterization of a Polypeptide from Tuberculin Purified 

Protein Derivative 

An M. tuberculosis polypeptide was isolated from tuberculin purified protein 
derivative (PPD) as follows. 

PPD was prepared as published with some modification (Seibert, F. et al., 
Tuberculin purified protein derivative. Preparation and analyses of a large quantity for 
standard. The American Review of Tuberculosis 44:9-25. 1941). 

M. tuberculosis Rv strain was grown for 6 weeks in synthetic medium in roller 
bottles at 37°C. Bottles containing the bacterial growth were then heated to 100° C in water 
vapor for 3 hours. Cultures were sterile filtered using a 0.22 u filter and the liquid phase was 
concentrated 20 times using a 3 kD cut-off membrane. Proteins were precipitated once with 
50% ammonium sulfate solution and eight times with 25% ammonium sulfate solution. The 
resulting proteins (PPD) were fractionated by reverse phase liquid chromatography (RP- 
HPLC) using a CI 8 column (7.8 x 300 mM; Waters, Milford, MA) in a Biocad HPLC system 
(Perseptive Biosystems, Framingham, MA). Fractions were eluted from the column with a 
linear gradient from 0-100% buffer (0.1% TFA in acetonitrile). The flow rate was 10 
ml/minute and eluent was monitored at 214 nm and 280 nm. 

Six fractions were collected, dried, suspended in PBS and tested individually 
in M. tuberculosis-infected guinea pigs for induction of delayed type hypersensitivity (DTH) 
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reaction. One fraction was found to induce a strong DTH reaction and was subsequently 
fractionated further by RP-HPLC on a microbore Vydac CI 8 column (Cat. No. 218TP5115) 
in a Perkin Elmer/Applied Biosystems Division Model 172 HPLC. Fractions were eluted 
with a linear gradient from 5*100% buffer (0.05% TFA in acetonitrile) with a flow rate of 80 
5 |nl/minute. Eluent was monitored at 215 nm. Eight fractions were collected and tested for 
induction of DTH in M tuberculosis-infected guinea pigs. One fraction was found to induce 
strong DTH of about 16 mm induration. The other fractions did not induce detectable DTH. 
The positive fraction was submitted to SDS-PAGE gel electrophoresis and found to contain a 
single protein band of approximately 12 kD molecular weight. 

10 This polypeptide, herein after referred to as DPPD, was sequenced from the 

amino terminal using a Perkin Elmer/Applied Biosystems Division Procise 492 protein 
sequencer as described above and found to have the N-terminal sequence shown in SEQ ID 
No.: 129. Comparison of this sequence with known sequences in the gene bank as described 
above revealed no known homologies. Four cyanogen bromide fragments of DPPD were 

15 isolated and found to have the sequences shown in SEQ ID Nos.: 130-133. 

The ability of the antigen DPPD to stimulate human PBMC to proliferate and 
to produce IFN-y was assayed as described in Example 1 . As shown in Table 9, DPPD was 
found to stimulate proliferation and elicit production of large quantities of IFN-y; more than 
that elicited by commercial PPD. 
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TABLE 9 

Results of Proliferation and Interferon-v Assays to DPPD 



PBMC Donor 


Stimulator 


Proliferation (CPM) 


IFN-y(OD 450 ) 


A 


Medium 


1,089 


0.17 




PPD (commercial) 


8,394 


1.29 




DPPD 


13,451 


2.21 










B 


Medium 


450 


0.09 




PPD (commercial) 


3,929 


1.26 




DPPD 


6,184 


1.49 










C 


Medium 


541 


0.11 




PPD (commercial) 


8,907 


0.76 




DPPD 


23,024 


>2.70 



5 

EXAMPLE 5 

USE OF REPRESENTATIVE ANTIGENS FOR DIAGNOSIS OF TUBERCULOSIS 

This example illustrates the effectiveness of several representative 
10 polypeptides in skin tests for the diagnosis of M. tuberculosis infection. 

Individuals were injected intradermally with 100 \x\ of either PBS or PBS plus 
Tween 20™ containing either 0.1 fig of protein (for TbH-9 and TbRa35) or 1.0 jig of protein 
(for TbRa38-l). Induration was measured between 5-7 days after injection, with a response 
of 5 mm or greater being considered positive. Of the 20 individuals tested, 2 were PPD 
15 negative and 18 were PPD positive. Of the PPD positive individuals, 3 had active 
tuberculosis, 3 had been previously infected with tuberculosis and 9 were healthy. In a 
second study, 13 PPD positive individuals were tested with 0.1 jig TbRal 1 in either PBS or 
PBS plus Tween 20™ as described above. The results of both studies are shown in Table 10. 
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TABLE 10 

RESULTS OF DTH TESTING WITH REPRESENTATIVE ANTIGENS 





TbH-9 
Pos/Total 


Tb38-1 
Pos/Total 


TbRa35 
Pos/Total 


Cumulative 
Pos/Total 


TbRall 
Pos/Total 


PPD negative 


0/2 


0/2 


0/2 


0/2 
















PPD positive 












healthy 


5/9 


4/9 


4/9 


6/9 


1/4 


prior TB 


3/5 


2/5 


2/5 


4/5 


3/5 


active 


3/4 


3/4 


0/4 


4/4 


1/4 


TOTAL 


11/18 


9/18 


6/18 


14/18 


5/13 



5 



EXAMPLE 6 
Synthesis of Synthetic Polypeptides 

10 Polypeptides may be synthesized on a Millipore 9050 peptide synthesizer 

using FMOC chemistry with HPTU (O-Benzotriazole-N^^'^'-tetramethyluronium 
hexafluorophosphate) activation. A Gly-Cys-Gly sequence may be attached to the amino 
terminus of the peptide to provide a method of conjugation or labeling of the peptide. 
Cleavage of the peptides from the solid support may be carried out using the following 

15 cleavage mixture: trifluoroacetic acid:ethanedithiol:thioanisole:water:phenol (40:1:2:2:3). 
After cleaving for 2 hours, the peptides may be precipitated in cold methyl-t-butyl-ether. The 
peptide pellets may then be dissolved in water containing 0.1% trifluoroacetic acid (TFA) and 
lyophilized prior to purification by CI 8 reverse phase HPLC. A gradient of 0%-60% 
acetonitrile (containing 0.1% TFA) in water (containing 0.1% TFA) may be used to elute the 

20 peptides. Following lyophilization of the pure fractions, the peptides may be characterized 
using electrospray mass spectrometry and by amino acid analysis. 
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EXAMPLE 7 

Preparation and Characterization of M. Tuberculosis Fusion Protftists 

A fusion protein containing TbRa3, the 38 kD antigen and Tb38-1 was 
prepared as follows. 

Each of the DNA constructs TbRa3, 38 kD and Tb38-1 were modified by PCR 
in order to facilitate their fusion and the subsequent expression of the fusion protein TbRa3- 
38 kD-Tb38-l. TbRa3, 38 kD and Tb38-1 DNA was used to perform PCR using the primers 
PDM-64 and PDM-65 (SEQ ID NO: 146 and 147), PDM-57 and PDM-58 (SEQ ID NO: 148 
and 149), and PDM-69 and PDM-60 (SEQ ID NO: 150 and 151), respectively. In each case, 
the DNA amplification was performed using 10 ul 10X Pfu buffer, 2 ul 10 mM dNTPs, 2 ul 
each of the PCR primers at 10 uM concentration, 81.5 ul water, 1.5 ul Pfu DNA polymerase 
(Stratagene, La Jolla, CA) and 1 ul DNA at either 70 ng/ul (for TbRa3) or 50 ng/ul (for 38 
kD and Tb38-1). For TbRa3, denaturation at 94°C was performed for 2 min, followed by 40 
cycles of 96°C for 15 sec and 72°C for 1 min, and lastly by 72°C for 4 min. For 38 kD, 
denaturation at 96°C was performed for 2 min, followed by 40 cycles of 96°C for 30 sec, 
68°C for 15 sec and 72°C for 3 min, and finally by 72°C for 4 min. For Tb38-1 denaturation 
at 94°C for 2 min was followed by 10 cycles of 96°C for 15 sec, 68°C for 15 sec and 72°C for 
1.5 min, 30 cycles of 96°C for 15 sec, 64°C for 15 sec and 72°C for 1.5, and finally by 72°C 
for 4 min. 

The TbRa3 PCR fragment was digested with Ndel and EcoRI and cloned 
directly into pT7 A L2 IL 1 vector using Ndel and EcoRI sites. The 38 kD PCR fragment was 
digested with Sse8387I, treated with T4 DNA polymerase to make blunt ends and then 
digested with EcoRI for direct cloning into the pT7 A L2Ra3-l vector which was digested with 
StuI and EcoRI. The 38-1 PCR fragment was digested with Eco47III and EcoRI and directly 
subcloned into pT7 A L2Ra3/38kD-17 digested with the same enzymes. The whole fusion was 
then transferred to pET28b - using Ndel and EcoRI sites. The fusion construct was 
confirmed by DNA sequencing. 

The expression construct was transformed into BLR pLys S E. coli (Novagen, 
Madison, WI) and grown overnight in LB broth with kanamycin (30 ug/ml) and 
chloramphenicol (34 ug/ml). This culture (12 ml) was used to inoculate 500 ml 2XYT with 
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the same antibiotics and the culture was induced with IPTG at an OD560 of 0.44 to a final 
concentration of 1.2 mM. Four hours post-induction, the bacteria were harvested and 
sonicated in 20 mM Tris (8.0), 100 mM NaCl, 0.1% DOC, 20 |ag/ml Leupeptin, 20 mM 
PMSF followed by centrifugation at 26,000 X g. The resulting pellet was resuspended in 8 M 
5 urea, 20 mM Tris (8.0), 100 mM NaCl and bound to Pro-bond nickel resin (Invitrogen, 
Carlsbad, CA). The column was washed several times with the above buffer then eluted with 
an imidazole gradient (50 mM, 100 mM, 500 mM imidazole was added to 8 M urea, 20 mM 
Tris (8.0), 100 mM NaCl). The eluates containing the protein of interest were then dialzyed 
against 10 mM Tris (8.0). 
10 The DNA and amino acid sequences for the resulting fusion protein 

(hereinafter referred to as TbRa3-38 kD-Tb38-l) are provided in SEQ ID NO: 152 and 153, 
respectively. 

A fusion protein containing the two antigens TbH-9 and Tb38-1 (hereinafter 
referred to as TbH9-Tb3 8-1) without a hinge sequence, was prepared using a similar 
1 5 procedure to that described above. The DNA sequence for the TbH9-Tb38-l fusion protein is 
provided in SEQ ID NO: 156. 

The ability of the fusion protein TbH9-Tb38-l to induce T cell proliferation 
and IFN-y production in PBMC preparations was examined using the protocol described 
above in Example 1 . PBMC from three donors were employed: one who had been previously 
20 shown to respond to TbH9 but not Tb38-1 (donor 131); one who had been shown to respond 
to Tb38-1 but not TbH9 (donor 184); and one who had been shown to respond to both 
antigens (donor 201). The results of these studies (Figs. 5-7, respectively) demonstrate the 
functional activity of both the antigens in the fusion protein. 

A fusion protein containing TbRa3, the antigen 38kD, Tb38-1 and DPEP was 
25 prepared as follows. 

Each of the DNA constructs TbRa3, 38 kD and Tb38-1 were modified by PCR 
and cloned into vectors essentially as described above, with the primers PDM-69 (SEQ ID 
NO:150 and PDM-83 (SEQ ID NO: 205) being used for amplification of the Tb38-1A 
fragment. Tb38-1A differs from Tb38-1 by a Dral site at the 3' end of the coding region that 
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keeps the final amino acid intact while creating a blunt restriction site that is in frame. The 
TbRa3/3 8kD/Tb3 8- 1 A fusion was then transferred to pET28b using Ndel and EcoRl sites. 

DPEP DNA was used to perform PCR using the primers PDM-84 and PDM- 
85 (SEQ ID NO: 206 and 207, respectively) and 1 p.1 DNA at 50 ng/jil. Denaturation at 94 °C 
was performed for 2 min, followed by 10 cycles of 96 °C for 15 sec, 68 °C for 15 sec and 72 
°C for 1.5 min; 30 cycles of 96 °C for 15 sec, 64 °C for 15 sec and 72 °C for 1.5 min; and 
finally by 72 °C for 4 min. The DPEP PCR fragment was digested with EcoRI and Eco72I 
and clones directly into the pET28Ra3/38kD/38-lA construct which was digested with Oral 
and EcoRI. The fusion construct was confirmed to be correct by DNA sequencing. 
Recombinant protein was prepared as described above. The DNA and amino acid sequences 
for the resulting fusion protein (hereinafter referred to as TbF-2) are provided in SEQ ID NO: 
208 and 209, respectively. 

The reactivity of the fusion protein TbF-2 with sera from M tuberculosis- 
infected patients was examined by ELISA using the protocol described above. The results of 
these studies (Table 1 1) demonstrate that all four antigens function independently in the 
fusion protein. 
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Table 11 

Reactivity of TbF-2 Fusion Recombinant with TB and Normal Sera 



Serum ID 


Status 


TbF 
(JU450 


Status 


TbF-2 
OD450 


Status 


ELISA Reactivity 














38 kD 


TbRa3 


Tb38-I 


DPEP 


Ofll 1 Af\ 


1 D 


0.57 


+ 


0.321 




- 


+ 


- 


+ 


om i a i 
t5y 3 1 -4 1 


1 D 


0.601 


+ 


0.396 




+ 


+ 


+ 


- 


DO! 1 1 An 


TO 

1 D 


A A f\ A 

0.494 




0.404 




+ 


+ 


± 


- 


doi 1 in 
r$yj 1-1 Jz 


I D 


1.502 


+ 


1.292 


+ 


+ 


+ 


+ 


± 


3UU4 


1 D 


1.806 




1.666 




± 


± 


+ 


- 


1 <AA/i 


1 D 


2.862 




2.468 




+ 


+ 




- 


3VUU4 


1 D 


2.443 




1.722 


+ 


+ 


+ 


+ 


- 


AQAA/I 
OoUU4 


1 D 


2.871 


+ 


2.575 


+ 


+ 


+ 


+ 


- 




1 ts 


A C C\ 1 

0.691 


+ 


0.971 


+ 


- 


± 


+ 


- 


i A*7AA/i 
1U /UU4 


1 o 


A Ol C 

0.875 


+ 


0.732 


+ 


- 


± 


+ 


- 




1 D 


1.632 


+ 


1.394 


+ 


+ 


± 


± 


- 


y /uu4 


1 fc> 


1.491 


+ 


1.979 


+ 


+ 


± 


- 




1 1 Qf\f\A 


1 D 


3.182 


+ 


3.045 


+ 


+ 


± 


- 


- 


1 /3UU4 


1 O 


3.644 


+ 


3.578 


+ 


+ 


+ 


+ 


- 


1 "7CAA/I 

I /3UU4 


1 D 


3.332 


+ 


2.916 


+ 


+ 


+ 


- 


- 


T7A f\f\A 

Z /4UU4 


TB 


3.696 


+ 


3.716 


+ 


- 


+ 


- 


+ 


Z /OUU4 


to 

1 D 


3.243 


+ 


2.56 


+ 


- 


- 


+ 


- 


zozUl>4 


To 


1.249 


+ 


1.234 


+ 


+ 


- 


- 


- 


OttQAA/1 

zoVUU4 


1 O 


1.373 


+ 


1.17 


+ 


- 


+ 


- 


- 


lAftAA/f 


1 D 


3.708 


+ 


3.355 


+ 


- 


- 


+ 


- 


11/1 A A /I 

3 14U04 


1 D 


1.663 


+ 


1.399 


+ 


- 


- 


+ 


- 


1 1 "7AA/1 

J 1 /UU4 


1 £> 


1.163 




0.92 


+ 


+ 


- 


- 


- 


1 1 OAA/l 

J 1 ZUU4 


1 O 


1 *TAA 

1.709 




1.453 




- 


+ 


- 


- 




1 D 


a n o 




0.461 


+ 


- 


± 


- 


+ 


A<1 AA4 


TD 

1 ts 


A 1 9 

U. lo 




A ~i 

0.2 






- 


- 


± 




TD 
1 O 


A 1 QS 

{}. i oft 




A A 

0.469 










± 


A 1 AAA/1 


1 D 


A *2 C/l 


+ 


2.392 


+ 


± 








411004 


TB 


0.306 


+ 


0.874 


+ 




+ 




+ 


421004 


TB 


0.357 


+ 


1.456 






+ 




+ 


528004 


TB 


0.047 




0.196 










+ 


A6-87 


Normal 


0.094 




0.063 












A6-88 


Normal 


0.214 




0.19 












A6-89 


Normal 


0.248 




0.125 












A6-90 


Normal 


0.179 




0.206 












A6-91 


Normal 


0.135 




0.151 












A6-92 


Normal 


0.064 




0.097 












A6-93 


Normal 


0.072 




0.098 












A6-94 


Normal 


0.072 




0.064 












A6-95 


Normal 


0.125 




0.159 












A6-96 


Normal 


0.121 




0.12 
































Cut-off 




0.284 




0.266 
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One of skill in the art will appreciate that the order of the individual antigens 
within the fusion protein may be changed and that comparable activity would be expected 
provided each of the epitopes is still functionally available. In addition, truncated forms of 
the proteins containing active epitopes may be used in the construction of fusion proteins. 

From the foregoing, it will be appreciated that, although specific embodiments 
of the invention have been described herein for the purpose of illustration, various 
modifications may be made without deviating from the spirit and scope of the invention. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANTS: Reed, Steven G. 

Skeiky, Yasir A.W. 
Dillon, Davin C. 
Campos-Neto, Antonio 
Houghton, Raymond 
Vedvick, Thomas S. 
Twardzik, Daniel R. 
Lodes, Michael J. 

(ii) TITLE OF INVENTION: COMPOUNDS AND METHODS FOR IMMUNOTHERAPY 
AND DIAGNOSIS OF TUBERCULOSIS 

(iii) NUMBER OF SEQUENCES: 214 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SEED and BERRY LLP 

(B) STREET: 6300 Columbia Center, 701 Fifth Avenue 

(C) CITY: Seattle 

(D) STATE: Washington 

(E) COUNTRY: USA 

(F) ZIP : 98104-7092 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: 
<B) FILING DATE: 01 -OCT- 19 97 
(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Maki, David J. 

(B) REGISTRATION NUMBER: 31,392 

(C) REFERENCE /DOCKET NUMBER: 210121. 411C7 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (206) 622-4900 

(B) TELEFAX: (206) 682-6031 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
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AGAGCCATGT 


GACGGTAGTT 


GCGGTGCTCG 


GGGTACTCGG 


CGTATTTCTG 


ATGGTCTCGG 


420 


CGACGTTTAA 


C AAG C C C AGC 


GCCTATTCGA 


CCGGTTGGGC 


ATTGTGGGTT 


GTGTTGGCTT 


480 


TCATCGTGTT 


CCAGGCGGTT 


GCGGCAGTCC 


TGGCGCTCTT 


GGTGGAGACC 


GGCGCTATCA 


540 


CCGCGCCGGC 


GCCGCGGCCC 


AAGTTCGACC 


C G TAT G G AC A 


GTACGGGCGG 


TACGGGCAGT 


600 


ACGGGCAGTA 


CGGGGTGCAG 


CCGGGTGGGT 


ACTACGGTCA 


GCAGGGTGCT 


CAGCAGGCCG 


660 


CGGGACTGCA 


GTCGCCCGGC 


CCGCAGCAGT 


CTCCGCAGCC 


TCCCGGATAT 


GGGTCGCAGT 


720 


ACGGCGGCTA 


TTCGTCCAGT 


CCGAGCCAAT 


CGGGCAGTGG 


ATACACTGCT 


CAGCCCCCGG 


780 


CCCAGCCGCC 


GGCGCAGTCC 


GGGTCGCAAC 


AATCGCACCA 


GGGCCCATCC 


ACGCCACCTA 


840 


CCGGCTTTCC 


GAGCTTCAGC 


CCACCACCAC 


CGGTCAGTGC 


CGGGACGGGG 


TCGCAGGCTG 


900 


GTTCGGCTCC 


AG T C AAC TAT 


TCAAACCCCA 


GCGGGGGCGA 


GCAGTCGTCG 


TCCCCCGGGG 


960 


GGGCGCCGGT 


CTAACCGGGC 


GTTCCCGCGT 


CCGGTCGCGC 


GTGTGCGCGA 


AG AG T G AAC A 


1020 


GGGTGTCAGC 


AAGCGCGGAC 


GATCCTCGTG 


CCGAATTC 






1058 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 327 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



CGGCACGAGA 


GACCGATGCC 


GCTACCCTCG 


CGCAGGAGGC 


AGGTAATTTC 


GAGCGGATCT 


60 


CCGGCGACCT 


GAAAACCCAG 


ATCGACCAGG 


TGGAGTCGAC 


GGCAGGTTCG 


TTGCAGGGCC 


120 


AGTGGCGCGG 


CGCGGCGGGG 


ACGGCCGCCC 


AGGCCGCGGT 


GGTGCGCTTC 


CAAGAAGCAG 


180 


CCAATAAGCA 


GAAGCAGGAA 


CTCGACGAGA 


TCTCGACGAA 


TATTCGTCAG 


GCCGGCGTCC 


240 


AATACTCGAG 


GGCCGACGAG 


GAGCAGCAGC 


AGGCGCTGTC 


CTCGCAAATG 


GGCTTCTGAC 


300 


CCGCTAATAC 


GAAAAGAAAC 


GGAGCAA 








327 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
CGGTCGCGAT GATGGCGTTG TCGAACGTGA CCGATTCTGT ACCGCCGTCG TTGAGATCAA 6 
CCAACAACGT GTTGGCGTCG GCAAATGTGC CGNACCCGTG GATCTCGGTG ATCTTGTTCT 12 
TCTTCATCAG GAAGTGCACA CCGGCCACCC TGCCCTCGGN TACCTTTCGG 17 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 127 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
GATCCGGCGG CACGGGGGGT GCCGGCGGCA GCACCGCTGG CGCTGGCGGC AACGGCGGGG 
CCGGGGGTGG CGGCGGAACC GGTGGGTTGC TCTTCGGCAA CGGCGGTGCC GGCGGGCACG 
GGGCCGT 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 81 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 9: 
CGGCGGCAAG GGCGGCACCG CCGGCAACGG GAGCGGCGCG GCCGGCGGCA ACGGCGGCAA 
CGGCGGCTCC GGCCTCAACG G 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
GATCAGGGCT GGCCGGCTCC GGCCAGAAGG GCGGTAACGG AGGAGCTGCC GGATTGTTTG 
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85 90 95 

Pro Ala Ala Gly Gly Gly Ala 
100 

(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 88 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

Val Gin Cys Arg Val Trp Leu Glu He Gin Trp Arg Gly Met Leu Gly 
15 10 15 

Ala Asp Gin Ala Arg Ala Gly Gly Pro Ala Arg He Trp Arq Glu His 
20 25 30 

Ser Met Ala Ala Met Lys Pro Arg Thr Gly Asp Gly Pro Leu Glu Ala 
35 40 45 

Thr Lys Glu Gly Arg Gly He Val Met Arg Val Pro Leu Glu Glv Glv 
50 55 60 

Gly Arg Leu Val Val Glu Leu Thr Pro Asp Glu Ala Ala Ala Leu Gly 
65 70 75 80 

Asp Glu Leu Lys Gly Val Thr Ser 
85 

(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 95 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

Thr Asp Ala Ala Thr Leu Ala Gin Glu Ala Gly Asn Phe Glu Arg He 
1 5 10 15 

Ser Gly Asp Leu Lys Thr Gin He Asp Gin Val Glu Ser Thr Ala Gly 
20 25 30 

Ser Leu Gin Gly Gin Trp Arg Gly Ala Ala Gly Thr Ala Ala Gin Ala 
35 40 45 

Ala Val Val Arg Phe Gin Glu Ala Ala Asn Lys Gin Lys Gin Glu Leu 
50 55 60 
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Asp Glu lie Ser Thr Asn lie Arg Gin Ala Gly Val Gin Tyr Ser Arg 
65 70 75 80 

Ala Asp Glu Glu Gin Gin Gin Ala Leu Ser Ser Gin Met Gly Phe 
85 90 95 

(2) INFORMATION FOR SEQ ID NO : 8 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: 

Met Thr Gin Ser Gin Thr Val Thr Val Asp Gin Gin Glu lie Leu Asn 
15 10 15 

Arg Ala Asn Glu Val Glu Ala Pro Met Ala Asp Pro Pro Thr Asp Val 
20 25 30 

Pro lie Thr Pro Cys Glu Leu Thr Xaa Xaa Lys Asn Ala Ala Gin Gin 
35 40 45 

Xaa Val Leu Ser Ala Asp Asn Met Arg Glu Tyr Leu Ala Ala Gly Ala 
50 55 60 

Lys Glu Arg Gin Arg Leu Ala Thr Ser Leu Arg Asn Ala Ala Lys Xaa 
65 70 75 80 

Tyr Gly Glu Val Asp Glu Glu Ala Ala Thr Ala Leu Asp Asn Asp Gly 
85 90 95 

Glu Gly Thr Val Gin Ala Glu Ser Ala Gly Ala Val Gly Gly Asp Ser 
100 105 HO 

Ser Ala Glu Leu Thr Asp Thr Pro Arg Val Ala Thr Ala Gly Glu Pro 
115 120 125 

Asn Phe Met Asp Leu Lys Glu Ala Ala Arg Lys Leu Glu Thr Gly Asp 
130 135 140 

Gin Gly Ala Ser Leu Ala His Xaa Gly Asp Gly Trp Asn Thr Xaa Thr 
145 150 155 160 

Leu Thr Leu Gin Gly Asp 
165 

(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
{ B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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Gin Gin Gly Ala Gin Gin Ala Ala Gly Leu Gin Ser Pro Gly Pro Gin 
195 200 205 

Gin Ser Pro Gin Pro Pro Gly Tyr Gly Ser Gin Tyr Gly Gly Tvr Ser 
210 215 220 

Ser Ser Pro Ser Gin Ser Gly Ser Gly Tyr Thr Ala Gin Pro Pro Ala 
225 230 235 2 40 

Gin Pro Pro Ala Gin Ser Gly Ser Gin Gin Ser His Gin Gly Pro Ser 
245 250 255 

Thr Pro Pro Thr Gly Phe Pro Ser Phe Ser Pro Pro Pro Pro Val Ser 
260 265 270 

Ala Gly Thr Gly Ser Gin Ala Gly Ser Ala Pro Val Asn Tyr Ser Asn 
275 280 285 

Pro Ser Gly Gly Glu Gin Ser Ser Ser Pro Gly Gly Ala Pro Val 
290 295 300 

(2) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 

Gly Cys Gly Glu Thr Asp Ala Ala Thr Leu Ala Gin Glu Ala Gly Asn 
15 10 15 

Phe Glu Arg lie Ser Gly Asp Leu Lys Thr Gin lie 
20 25 

(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 

Asp Gin Val Glu Ser Thr Ala Gly Ser Leu Gin Gly Gin Trp Arg Gly 
15 10 15 

(2) INFORMATION FOR SEQ ID NO: 95: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 

Gly Cys Gly Ser Thr Ala Gly Ser Leu Gin Gly Gin Trp Arg Gly Ala 
15 10 15 

Ala Gly Thr Ala Ala Gin Ala Ala Val Val Arg 
20 25 

(2) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 

Gly Cys Gly Gly Thr Ala Ala Gin Ala Ala Val Val Arg Phe Gin Glu 
15 10 15 

Ala Ala Asn Lys Gin Lys Gin Glu Leu Asp Glu 
20 25 

(2) INFORMATION FOR SEQ ID NO: 97: 

- (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 

Gly Cys Gly Ala Asn Lys Gin Lys Gin Glu Leu Asp Glu lie Ser Thr 
15 10 15 

Asn lie Arg Gin Ala Gly Val Gin Tyr Ser Arg 
20 25 

(2) INFORMATION FOR SEQ ID NO: 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 

Gly Cys Gly He Arg Gin Ala Gly Val Gin Tyr Ser Arq Ala Asp Glu 
15 10 15 

Glu Gin Gin Gin Ala Leu Ser Ser Gin Met Gly Phe 
20 25 



(2) INFORMATION FOR SEQ ID NO: 99: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 507 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 



ATGAAGATGG 


TGAAATCGAT 


CGCCGCAGGT 


CTGACCGCCG 


CGGCTGCAAT 


CGGCGCCGCT 


60 


GCGGCCGGTG 


TGACTTCGAT 


CATGGCTGGC 


GGCCCGGTCG 


TATACCAGAT 


GCAGCCGGTC 


120 


GTCTTCGGCG 


CGCCACTGCC 


GTTGGACCCG 


GCATCCGCCC 


CTGACGTCCC 


GACCGCCGCC 


180 


CAGTTGACCA 


GCCTGCTCAA 


CAGCCTCGCC 


GATCCCAACG 


TGTCGTTTGC 


GAACAAGGGC 


240 


AGTCTGGTCG 


AGGGCGGCAT 


CGGGGGCACC 


GAGGCGCGCA 


TCGCCGACCA 


CAAGCTGAAG 


300 


AAGGCCGCCG 


AGCACGGGGA 


TCTGCCGCTG 


TCGTTCAGCG 


TGACGAACAT 


CCAGCCGGCG 


360 


GCCGCCGGTT 


CGGCCACCGC 


CGACGTTTCC 


GTCTCGGGTC 


CGAAGCTCTC 


GTCGCCGGTC 


420 


ACGCAGAACG 


TCACGTTCGT 


GAATCAAGGC 


GGCTGGATGC 


TGTCACGCGC 


ATCGGCGATG 


480 


GAGTTGCTGC 


AGGCCGCAGG 


GAACTGA 








507 



(2) INFORMATION FOR SEQ ID NO: 100: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 168 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO: 100: 



0 



e 
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Trp Ala Gin Asp Ala Ala Ala Met Phe Gly Tyr Ala Ala Thr Ala Ala 
145 150 155 160 

Thr Ala Thr Glu Ala Leu Leu Pro Phe Glu Asp Ala Pro Leu lie Thr 
165 170 175 

Asn Pro Gly Gly Leu Leu Glu Gin Ala Val Ala Val Glu Glu Ala lie 
180 185 190 

Asp Thr Ala Ala Ala Asn Gin Leu Met Asn Asn Val Pro Gin Ala Leu 
195 200 205 

Gin Gin Leu Ala Gin Pro Thr Lys Ser lie Trp Pro Phe Asp Gin Leu 
210 215 220 

Ser Glu Leu Trp Lys Ala lie Ser Pro His Leu Ser Pro Leu Ser Asn 
225 230 235 240 

lie Val Ser Met Leu Asn Asn His Val Ser Met Thr Asn Ser Gly Val 
245 250 255 

Ser Met Ala Ser Thr Leu His Ser Met Leu Lys Gly Phe Ala Pro Ala 
260 265 270 

Ala Ala Gin Ala Val Glu Thr Ala Ala Gin Asn Gly Val Gin Ala Met 
275 280 285 

Ser Ser Leu Gly Ser Gin Leu Gly Ser Ser Leu Gly Ser Ser Gly Leu 
290 295 300 

Gly Ala Gly Val Ala Ala Asn Leu Gly Arg Ala Ala Ser Val Gly Ser 
305 310 315 320 

Leu Ser Val Pro Gin Ala Trp Ala Ala Ala Asn Gin Ala Val Thr Pro 
325 330 335 

Ala Ala Arg Ala Leu Pro Leu Thr Ser Leu Thr Ser Ala Ala Gin Thr 
340 345 350 

Ala Pro Gly His Met Leu Gly Gly Leu Pro Leu Gly Gin Leu Thr Asn 
355 360 365 

Ser Gly Gly Gly Phe Gly Gly Val Ser Asn Ala Leu Arg Met Pro Pro 
370 375 380 

Arg Ala Tyr Val Met Pro Arg Val Pro Ala Ala Gly 
385 390 395 

(2) INFORMATION FOR SEQ ID NO: 112: 

(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 1616 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 



p 7\ t 1 p p p 2i p p p 
LA 1 bbbAbbb 


7\ PTP J\rppT\ PP 

Ab 1 bA I LALL 


7\ »T»pp*Ttprp/^^»/^ 

AlbL TbTbGC 


A PPP A A T»PP^"« 

ACGCAATGCC 


AC C GG AG TAA 


ATACCGCACG 


60 


GCi bA 1 bbbb 


bbbbbbbb 1 b 


LbbCTCCAAT 


GCTTGCGGCG 


GCCGCGGGAT 


GGCAGACGCT 


120 


1 I bbbbbbb 1 


PTPP A PPPT 1 /^ 

b 1 bbAbbb 1 b 


AbbLLb I LbA 


bTTbACCGCG 


CGCCTGAACT 


CTCTGGGAGA 


180 


Abbb I bbAb 1 


bbAbb 1 bbbA 


PPP A P A A PPP 

bLGACAAbbC 


GCTTGCGGCT 


GCAACGCCGA 


TGGTGGTCTG 


240 


PPT 1 APA TV A PP 


PPPTP7\ npAP 

bbb 1 LAALAL 


APPPPA APAP 

Ab b C CAAbAC 


CCGTbCGATG 


CAGGCGACGG 


CGCAAGCCGC 


300 


bbbA 1 ALALL 


PBPPP'P 7\ upp 

bAbbbbAl bb 


LCACbACGCC 


p'Pppr , T<pppr« 

GTCGCTGCCG 


GAGATCGCCG 


CCAACCACAT 


360 


b AL bb A La b b b 


b 1 bb 1 I Abbb 


PPAPPA A PT""p 

CbACbAACTT 


CTTCGGTATC 


AACACGATCC 


CGATCGCGTT 


420 


P7\PP*'P7\P'A r r , P 

oALLbAbA 1 b 


bAI LAI 1 1 UA 


TbCbTATGTG 


GAACCAGGCA 


GCCCTGGCAA 


TGGAGGTCTA 


480 




ALLbUbb 1 1 A 


A PA prnTTiprn 

ALAbbC TTTT 


CGAGAAGCTC 


GAGCCGATGG 


CGTCGATCCT 


540 


1 bA L LLLbuL 


ppp a pppn P7\ 
b b bAb b LAbA 


PPAPPA PP 7\ 7\ 

bCACGACGAA 


CCCGATCTTC 


GGAATGCCCT 


CCCCTGGCAG 


600 


1 bArtb-Tlbbb 


PTTPPPP APT 1 
bi 1 bbbbAb I 


1 bLLbLbbbC 


GGCTACCCAG 


ACCCTCGGCC 


AACTGGGTGA 


660 


oral bAbbbbb 


ppp a t»pp npp 
LLbAi bbAbb 


APPT^PAPPPA 

Abb 1 bAbbbA 


GCCGCTGCAG 


CAGGTGACGT 


CGTTGTTCAG 


720 


\^\^t\\j\j 1 bbbb 


b b b Ab Lb b b b 


fT^C^f* A A PPP 

bbbGLAACCC 


AGCCGACGAG 


GAAGCCGCGC 


AGATGGGCCT 


780 


pp«T»/-»f-«r-- , pA.pp 
bb 1 bbbbAbb 


7\ PfPPPprpfTri 
Ab 1 ULbL lb I 


PP A A PP A T»PP 

b b AAC CAT C C 


GCTGGCTGGT 


GGATCAGGCC 


CCAGCGCGGG 


840 


LbLbbbLU i. 0 


b 1 bbbbbbbb 


TAP*T»PPPTIAPP 

Ab 1 LbLIALb 


TGGCGCAGGT 


GGGTCGTTGA 


CCCGCACGCC 


900 


1 bA Ibl^l 


PA PPTPA T"PP 

LAbt 1 bAI bb 


A A A A P P P P P T> 

AAAAbCCGGT 


TGCCCCCTCG 


GTGATGCCGG 


CGGCTGCTGC 


960 




PPP 7\ pppprpp 

bCbAbbbb 1 b 


bbbbbbb rcc 


GGTGGGTGCG 


GGAGCGATGG 


GCCAGGGTGC 


1020 




r* f** f~* TP O 7A P 1 P 7A 

bbb 1 bbALLA 


bbbbbbbTCT 


GGTCGCGCCG 


GCACCGCTCG 


CGCAGGAGCG 


1080 




r^arr*2\pp2ipp 

brab bAb bAL b 


RPTPPPTV PP A 

Ab 1 bbbAbbA 


APAPPAPPA^ 

AGAGGACGAC 


TGGTGAGCTC 


CCGTAATGAC 


1140 




bbbbbbnbbb 


bbbbbbbAAb 


AP'P'PPPPA A P 

Ab Tl GbCAAC 


ATTTTGGCGA 


GGAAGGTAAA 


1200 


GAGAGAAAGT 


AGTCCAGCAT 


GGCAGAGATG 


AAGACCGATG 


CCGCTACCCT 


CGCGCAGGAG 


1260 


GCAGGTAATT 


TCGAGCGGAT 


CTCCGGCGAC 


CTGAAAACCC 


AGATCGACCA 


GGTGGAGTCG 


1320 


ACGGCAGGTT 


CGTTGCAGGG 


CCAGTGGCGC 


GGCGCGGCGG 


GGACGGCCGC 


CCAGGCCGCG 


1380 


GTGGTGCGCT 


TCCAAGAAGC 


AGCCAATAAG 


CAGAAGCAGG 


AACTCGACGA 


GATCTCGACG 


1440 


AATATTCGTC 


AGGCCGGCGT 


CCAATACTCG 


AGGGCCGACG 


AGGAGCAGCA 


GCAGGCGCTG 


1500 


TCCTCGCAAA 


TGGGCTTCTG 


ACCCGCTAAT 


ACGAAAAGAA 


ACGGAGCAAA 


AAC AT G AC AG 


1560 


AGCAGCAGTG 


GAATTTCGCG 


GGTATCGAGG 


CCGCGGCAAG 


CGCAATCCAG 


GGAAAT 


1616 



(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 

CTAGTGGATG GGACCATGGC CATTTTCTGC AGTCTCACTG CCTTCTGTGT TGACATTTTG 60 

GCACGCCGGC GGAAACGAAG CACTGGGGTC GAAGAACGGC TGCGCTGCCA TATCGTCCGG 120 

AGCT TCCATA CCTTCGTGCG GCCGGAAGAG CTTGTCGTAG TCGGCCGCCA TGACAACCTC 18 0 

TCAGAGTGCG CTCAAACGTA TAAACACGAG AAAGGGCGAG ACCGACGGAA GGTCGAACTC 24 0 

GCCCGATCCC GTGTTTCGCT ATTCTACGCG AACTCGGCGT TGCCCTATGC GAACATCCCA 300 

GTGACGTTGC CTTCGGTCGA AGCCATTGCC TGACCGGCTT CGCTGATCGT CCGCGCCAGG 360 

TTCTGCAGCG CGTTGTTCAG CTCGGTAGCC GTGGCGTCCC ATTTTTGCTG GACACCCTGG 42 0 

TACGCCTCCG AA 432 
(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 68 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 

Met Leu Trp His Ala Met Pro Pro Glu Xaa Asn Thr Ala Arg Leu Met 
15 10 is 

Ala Gly Ala Gly Pro Ala Pro Met Leu Ala Ala Ala Ala Gly Trp Gin 
20 25 30 

Thr Leu Ser Ala Ala Leu Asp Ala Gin Ala Val Glu Leu Thr Ala Arg 
35 40 45 

Leu Asn Ser Leu Gly Glu Ala Trp Thr Gly Gly Gly Ser Asp Lys Ala 
50 55 60 

Leu Ala Ala Ala Thr Pro Met Val Val Trp Leu Gin Thr Ala Ser Thr 
65 70 75 80 

Gin Ala Lys Thr Arg Ala Met Gin Ala Thr Ala Gin Ala Ala Ala Tyr 
85 90 95 

Thr Gin Ala Met Ala Thr Thr Pro Ser Leu Pro Glu lie Ala Ala Asn 
100 105 no 

His lie Thr Gin Ala Val Leu Thr Ala Thr Asn Phe Phe Gly lie Asn 
115 120 125 
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Thr lie Pro He Ala Leu Thr Glu Met Asp Tyr Phe He Arg Met Trp 
iJ0 135 140 

Asn Gin Ala Ala Leu Ala Met Glu Val Tyr Gin Ala Glu Thr Ala Val 
145 150 155 160 

Asn Thr Leu Phe Glu Lys Leu Glu Pro Met Ala Ser He Leu Asp Pro 
165 170 17 5 

Gly Ala Ser Gin Ser Thr Thr Asn Pro lie Phe Gly Met Pro Ser Pro 
180 185 190 

Gly Ser Ser Thr Pro Val Gly Gin Leu Pro Pro Ala Ala Thr Gin Thr 
195 200 205 

Leu Gly Gin Leu Gly Glu Met Ser Gly Pro Met Gin Gin Leu Thr Gin 
210 215 220 

Pro Leu Gin Gin Val Thr Ser Leu Phe Ser Gin Val Gly Gly Thr Glv 
225 230 235 y 2 40 

Gly Gly Asn Pro Ala Asp Glu Glu Ala Ala Gin Met Gly Leu Leu Glv 
245 250 255 

Thr Ser Pro Leu Ser Asn His Pro Leu Ala Gly Gly Ser Gly Pro Ser 
260 265 270 

Ala Gly Ala Gly Leu Leu Arg Ala Glu Ser Leu Pro Gly Ala Gly Glv 
275 280 285 

Ser Leu Thr Arg Thr Pro Leu Met Ser Gin Leu He Glu Lys Pro Val 
290 295 300 

Ala Pro Ser Val Met Pro Ala Ala Ala Ala Gly Ser Ser Ala Thr Gly 
305 310 315 32 ^ 

Gly Ala Ala Pro Val Gly Ala Gly Ala Met Gly Gin Gly Ala Gin Ser 
325 330 335 

Gly Gly Ser Thr Arg Pro Gly Leu Val Ala Pro Ala Pro Leu Ala Gin 
340 345 350 

Glu Arg Glu Glu Asp Asp Glu Asp Asp Trp Asp Glu Glu Asp Asp Trp 
355 3 6 o 365 



(2) INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 

Met Ala Glu Met Lys Thr Asp Ala Ala Thr Leu Ala Gin Glu Ala Glv 

1 5 in 



15 



o o 
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Asn Phe Glu Arg lie Ser Gly Asp Leu Lys Thr Gin He Asp Gin Val 
20 25 30 

Glu Ser Thr Ala Gly Ser Leu Gin Gly Gin Trp Arg Gly Ala Ala Glv 
35 40 45 y 

Thr Ala Ala Gin Ala Ala Val Val Arg Phe Gin Glu Ala Ala Asn Lvs 
50 55 60 

Gin Lys Gin Glu Leu Asp Glu He Ser Thr Asn He Arg Gin Ala Glv 
65 7 0 75 so 

Val Gin Tyr Ser Arg Ala Asp Glu Glu Gin Gin Gin Ala Leu Ser Ser 
85 90 95 

Gin Met Gly Phe 
100 

(2) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 396 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 
GATCTCCGGC GACCTGAAAA CCCAGATCGA CCAGGTGGAG TCGACGGCAG GTTCGTTGCA 
GGGCCAGTGG CGCGGCGCGG CGGGGACGGC CGCCCAGGCC GCGGTGGTGC GCTTCCAAGA 
AGCAGCCAAT AAGCAGAAGC AGGAACTCGA CGAGATCTCG ACGAATATTC GTCAGGCCGG 
CGTCCAATAC TCGAGGGCCG AC GAG GAG C A GCAGCAGGCG CTGTCCTCGC AAATGGGCTT 
CTGACCCGCT AATACGAAAA GAAACGGAGC AAAAAC AT G A CAGAGCAGCA GTGGAATTTC 
GCGGGTATCG AGGCCGCGGC AAGCGCAATC CAGGGAAATG TCACGTCCAT TCATTCCCTC 
CTTGACGAGG GGAAGCAGTC CCTGACCAAG CTCGCA 
(2) INFORMATION FOR SEQ ID NO: 117: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 
He Ser Gly Asp Leu Lys Thr Gin He Asp Gin Val Glu Ser Thr Ala 
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10 



15 



Gly Ser Leu Gin Gly Gin Trp Arg Gly Ala Ala Gly Thr Ala Ala Gin 
20 25 30 

Ala Ala Val Val Arg Phe Gin Glu Ala Ala Asn Lys Gin Lys Gin Glu 
35 40 45 

Leu Asp Glu He Ser Thr Asn He Arg Gin Ala Gly Val Gin Tvr Ser 
50 55 60 

Arg Ala Asp Glu Glu Gin Gin Gin Ala Leu Ser Ser Gin Met Gly Phe 
65 7 0 75 so 



(2) INFORMATION FOR SEQ ID NO: 118: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 387 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 
GTGGATCCCG ATCCCGTGTT TCGCTATTCT ACGCGAACTC GGCGTTGCCC TATGCGAACA 
TCCCAGTGAC GTTGCCTTCG GTCGAAGCCA TTGCCTGACC GGCTTCGCTG ATCGTCCGCG 
CCAGGTTCTG CAGCGCGTTG TTCAGCTCGG TAGCCGTGGC GTCCCATTTT TGCTGGACAC 
CCTGGTACGC CTCCGAACCG CTACCGCCCC AGGCCGCTGC GAGCTTGGTC AGGGACTGCT 
TCCCCTCGTC AAGGAGGGAA TGAATGGACG TGACATTTCC CTGGATTGCG CTTGCCGCGG 
CCTCGATACC CGCGAAATTC CACTGCTGCT CTGTCATGTT TTTGCTCCGT TTCTTTTCGT 
ATTAGCGGGT CAGAAGCCCA TTTGCGA 
(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 272 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 
CGGCACGAGG ATCTCGGTTG GCCCAACGGC GCTGGCGAGG GCTCCGTTCC GGGGGCGAGC 
TGCGCGCCGG ATGCTTCCTC TGCCCGCAGC CGCGCCTGGA TGGATGGACC AGTTGCTACC 
TTCCCGACGT TTCGTTCGGT GTCTGTGCGA TAGCGGTGAC CCCGGCGCGC ACGTCGGGAG 
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TGTTGGGGGG CAGGCCGGGT CGGTGGTTCG GCCGGGGACG CAGACGGTCT GGACGGAACG 
GGCGGGGGTT CGCCGATTGG CATCTTTGCC CA 
(2) INFORMATION FOR SEQ ID NO: 120: 



(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 

Asp Pro Val Asp Ala Val He Asn Thr Thr Cys Asn Tyr Gly Gin Val 
15 10 15 

Val Ala Ala Leu 
20 

(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 

Ala Val Glu Ser Gly Met Leu Ala Leu Gly Thr Pro Ala Pro Ser 
15 10 15 

(2) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 

Ala Ala Met Lys Pro Arg Thr Gly Asp Gly Pro Leu Glu Ala Ala Lys 
15 10 15 

Glu Gly Arg 
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GAGAGAATTC TCAGAAGCCC ATTTGCGAGG ACA 
(2) INFORMATION FOR SEQ ID NO: 152: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1993 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

<B) LOCATION: 152.. 1273 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152: 

TGTTCTTCGA CGGCAGGCTG GTGGAGGAAG GGCCCACCGA ACAGCTGTTC TCCTCGCCGA 60 

AGCATGCGGA AACCGCCCGA TACGTCGCCG GACTGTCGGG GGACGTCAAG GACGCCAAGC 120 

GCGGAAATTG AAG AG C AC AG AAAGGTATGG C GTG AAA ATT CGT TTG CAT ACG 172 

Val Lys lie Arg Leu His Thr 
1 5 

CTG TTG GCC GTG TTG ACC GCT GCG CCG CTG CTG CTA GCA GCG GCG GGC 220 
Leu Leu Ala Val Leu Thr Ala Ala Pro Leu Leu Leu Ala Ala Ala Gly 
10 15 20 

TGT GGC TCG AAA CCA CCG AGC GGT TCG CCT GAA ACG GGC GCC GGC GCC 268 
Cys Gly Ser Lys Pro Pro Ser Gly Ser Pro Glu Thr Gly Ala Gly Ala 
25 30 35 

GGT ACT GTC GCG ACT ACC CCC GCG TCG TCG CCG GTG ACG TTG GCG GAG 316 
Gly Thr Val Ala Thr Thr Pro Ala Ser Ser Pro Val Thr Leu Ala Glu 
40 45 50 • 55 

ACC GGT AGC ACG CTG CTC TAC CCG CTG TTC AAC CTG TGG GGT CCG GCC 364 
Thr Gly Ser Thr Leu Leu Tyr Pro Leu Phe Asn Leu Trp Gly Pro Ala 
60 65 70 

TTT CAC GAG AGG TAT CCG AAC GTC ACG ATC ACC GCT CAG GGC ACC GGT 412 
Phe His Glu Arg Tyr Pro Asn Val Thr lie Thr Ala Gin Gly Thr Gly 
75 80 85 

TCT GGT GCC GGG ATC GCG CAG GCC GCC GCC GGG ACG GTC AAC ATT GGG 4 60 

Ser Gly Ala Gly He Ala Gin Ala Ala Ala Gly Thr Val Asn He Gly 
90 95 100 

GCC TCC GAC GCC TAT CTG TCG GAA GGT GAT ATG GCC GCG CAC AAG GGG 508 
Ala Ser Asp Ala Tyr Leu Ser Glu Gly Asp Met Ala Ala His Lys Gly 
105 HO 115 

CTG ATG AAC ATC GCG CTA GCC ATC TCC GCT CAG CAG GTC AAC TAC AAC 556 
Leu Met Asn He Ala Leu Ala He Ser Ala Gin Gin Val Asn Tyr Asn 
120 125 130 135 
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CTG CCC GGA GTG AGC GAG CAC CTC AAG CTG AAC GGA AAA GTC CTG GCG 604 
Leu Pro Gly Val Ser Glu His Leu Lys Leu Asn Gly Lys Val Leu Ala 
140 145 150 

GCC ATG TAC CAG GGC ACC ATC AAA ACC TGG GAC GAC CCG CAG ATC GCT 652 
Ala Met Tyr Gin Gly Thr lie Lys Thr Trp Asp Asp Pro Gin lie Ala 
155 160 165 

GCG CTC AAC CCC GGC GTG AAC CTG CCC GGC ACC GCG GTA GTT CCG CTG 700 
Ala Leu Asn Pro Gly Val Asn Leu Pro Gly Thr Ala Val Val Pro Leu 
170 175 180 

CAC CGC TCC GAC GGG TCC GGT GAC ACC TTC TTG TTC ACC CAG TAC CTG 74 8 

His Arg Ser Asp Gly Ser Gly Asp Thr Phe Leu Phe Thr Gin Tyr Leu 
185 190 195 

TCC AAG CAA GAT CCC GAG GGC TGG GGC AAG TCG CCC GGC TTC GGC ACC 7 96 

Ser Lys Gin Asp Pro Glu Gly Trp Gly Lys Ser Pro Gly Phe Gly Thr 
200 205 210 215 

ACC GTC GAC TTC CCG GCG GTG CCG GGT GCG CTG GGT GAG AAC GGC AAC 84 4 

Thr Val Asp Phe Pro Ala Val Pro Gly Ala Leu Gly Glu Asn Gly Asn 
220 225 230 

GGC GGC ATG GTG ACC GGT TGC GCC GAG ACA CCG GGC TGC GTG GCC TAT 8 92 

Gly Gly Met Val Thr Gly Cys Ala Glu Thr Pro Gly Cys Val Ala Tyr 
235 240 245 

ATC GGC ATC AGC TTC CTC GAC CAG GCC AGT CAA CGG GGA CTC GGC GAG 94 0 

lie Gly lie Ser Phe Leu Asp Gin Ala Ser Gin Arg Gly Leu Gly Glu 
250 255 260 

GCC CAA CTA GGC AAT AGC TCT GGC AAT TTC TTG TTG CCC GAC GCG CAA 98 8 

Ala Gin Leu Gly Asn Ser Ser Gly Asn Phe Leu Leu Pro Asp Ala Gin 
265 270 275 

AGC ATT CAG GCC GCG GCG GCT GGC TTC GCA TCG AAA ACC CCG GCG AAC 103 6 

Ser lie Gin Ala Ala Ala Ala Gly Phe Ala Ser Lys Thr Pro Ala Asn 
280 285 290 295 

CAG GCG ATT TCG ATG ATC GAC GGG CCC GCC CCG GAC GGC TAC CCG ATC 108 4 

Gin Ala lie Ser Met lie Asp Gly Pro Ala Pro Asp Gly Tyr Pro lie 
300 305 310 

ATC AAC TAC GAG TAC GCC ATC GTC AAC AAC CGG CAA AAG GAC GCC GCC 1132 
lie Asn Tyr Glu Tyr Ala lie Val Asn Asn Arg Gin Lys Asp Ala Ala 
315 320 325 

ACC GCG CAG ACC TTG CAG GCA TTT CTG CAC TGG GCG ATC ACC GAC GGC 118 0 

Thr Ala Gin Thr Leu Gin Ala Phe Leu His Trp Ala lie Thr Asp Gly 
330 335 340 

AAC AAG GCC TCG TTC CTC GAC CAG GTT CAT TTC CAG CCG CTG CCG CCC 122 8 

Asn Lys Ala Ser Phe Leu Asp Gin Val His Phe Gin Pro Leu Pro Pro 
345 350 355 

GCG GTG GTG AAG TTG TCT GAC GCG TTG ATC GCG ACG ATT TCC AGC 1273 
Ala Val Val Lys Leu Ser Asp Ala Leu lie Ala Thr lie Ser Ser 
360 365 370 



TAGCCTCGTT GACCACCACG CGACAGCAAC CTCCGTCGGG CCATCGGGCT GCTTTGCGGA 1333 
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GCATGCTGGC 


CCGTGCCGGT 


GAAGTCGGCC 


GCGCTGGCCC 


GGCCATCCGG 


TGGTTGGGTG 


1393 


GGATAGGTGC 


GGTGATCCCG 


CTGCTTGCGC 


TGGTCTTGGT 


GCTGGTGGTG 


CTGGTCATCG 


1453 


AGGCGATGGG 


TGCGATCAGG 


CTCAACGGGT 


TGCATTTCTT 


CACCGCCACC 


GAATGGAATC 


1513 


CAGGCAACAC 


CTACGGCGAA 


ACCGTTGTCA 


CCGACGCGTC 


GCCCATCCGG 


TCGGCGCCTA 


1573 


CTACGGGGCG 


TTGCCGCTGA 


TCGTCGGGAC 


GCTGGCGACC 


TCGGCAATCfi 


V— 1 Vj/11 




CGCGGTGCCG 


GTCTCTGTAG 


GAGCGGCGCT 


GGTGATCGTG 


GAACGGCTGC 


CGAAACGGTT 


1693 


GGCCGAGGCT 


GTGGGAATAG 


TCCTGGAATT 


GCTCGCCGGA 


ATCCCCAGCG 


TGGTCGTCGG 


1753 


TTTGTGGGGG 


GCAATGACGT 


TCGGGCCGTT 


CATCGCTCAT 


CACATCGCTC 


CGGTGATCGC 


1813 


TCACAACGCT 


CCCGATGTGC 


CGGTGCTGAA 


CTACTTGCGC 


GGCGACCCGG 


GCAACGGGGA 


1873 


GGGCATGTTG 


GTGTCCGGTC 


TGGTGTTGGC 


GGTGATGGTC 


GTTCCCATTA 


TCGCCACCAC 


1933 


CACTCATGAC 


CTGTTCCGGC 


AGGTGCCGGT 


GTTGCCCCGG 


GAGGGCGCGA 


TCGGGAATTC 


1993 



(2) INFORMATION FOR SEQ ID NO: 153: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 374 amino acids 
<B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 153: 

Val Lys lie Arg Leu His Thr Leu Leu Ala Val Leu Thr Ala Ala Pro 
1 5 10 15 

Leu Leu Leu Ala Ala Ala Gly Cys Gly Ser Lys Pro Pro Ser Gly Ser 
20 25 30 

Pro Glu Thr Gly Ala Gly Ala Gly Thr Val Ala Thr Thr Pro Ala Ser 
35 40 45 

Ser Pro Val Thr Leu Ala Glu Thr Gly Ser Thr Leu Leu Tyr Pro Leu 
50 55 60 

Phe Asn Leu Trp Gly Pro Ala Phe His Glu Arg Tyr Pro Asn Val Thr 
65 70 75 80 

lie Thr Ala Gin Gly Thr Gly Ser Gly Ala Gly lie Ala Gin Ala Ala 
85 90 95 

Ala Gly Thr Val Asn lie Gly Ala Ser Asp Ala Tyr Leu Ser Glu Gly 
100 105 110 

Asp Met Ala Ala His Lys Gly Leu Met Asn lie Ala Leu Ala lie Ser 
115 120 125 

Ala Gin Gin Val Asn Tyr Asn Leu Pro Gly Val Ser Glu His Leu Lys 
130 . 135 140 
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Leu Asn Gly Lys Val 
145 

Trp Asp Asp Pro Gin 
165 

Gly Thr Ala Val Val 
180 

Phe Leu Phe Thr Gin 
195 

Lys Ser Pro Gly Phe 
210 

Ala Leu Gly Glu Asn 
225 

Thr Pro Gly Cys Val 
245 

Ser Gin Arg Gly Leu 
260 

Phe Leu Leu Pro Asp 
275 

Ala Ser Lys Thr Pro 
290 

Ala Pro Asp Gly Tyr 
305 

Asn Arg Gin Lys Asp 
325 

His Trp Ala lie Thr 
340 

His Phe Gin Pro Leu 
355 

He Ala Thr He Ser 
370 



Leu Ala Ala Met Tyr Gin 
150 155 

He Ala Ala Leu Asn Pro 
170 

Pro Leu His Arg Ser Asp 
185 

Tyr Leu Ser Lys Gin Asp 
200 

Gly Thr Thr Val Asp Phe 
215 

Gly Asn Gly Gly Met Val 
230 235 

Ala Tyr He Gly He Ser 
250 

Gly Glu Ala Gin Leu Gly 
265 

Ala Gin Ser He Gin Ala 
280 

Ala Asn Gin Ala He Ser 
295 

Pro He He Asn Tyr Glu 
310 315 

Ala Ala Thr Ala Gin Thr 
330 

Asp Gly Asn Lys Ala Ser 
345 

Pro Pro Ala Val Val Lys 
360 

Ser 



Gly Thr He Lys Thr 
160 

Gly Val Asn Leu Pro 
175 

Gly Ser Gly Asp Thr 
190 

Pro Glu Gly Trp Gly 
205 



Pro Ala Val Pro Gly 
220 

Thr Gly Cys Ala Glu 
240 

Phe Leu Asp Gin Ala 
255 

Asn Ser Ser Gly Asn 
270 

Ala Ala Ala Gly Phe 
285 



Met He Asp Gly Pro 
300 

Tyr Ala He Val Asn 
320 

Leu Gin Ala Phe Leu 
335 

Phe Leu Asp Gin Val 
350 

Leu Ser Asp Ala Leu 
365 



(2) INFORMATION FOR SEQ ID NO: 154: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1993 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154: 
TGTTCTTCGA CGGCAGGCTG GTGGAGGAAG GGCCCACCGA ACAGCTGTTC TCCTCGCCGA 



60 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212: 
CTTCATGGAA TTCTCAGGCC GGTAAGGTCC GCTGCGG 
(2) INFORMATION FOR SEQ ID NO: 213: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7676 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 213: 



TGGCGAATGG 


GACGCGCCCT 


GTAGCGGCGC 


ATTAAGCGCG 


GCGGGTGTGG 


TGGTTACGCG 


60 


CAGCGTGACC 


GCTACACTTG 


CCAGCGCCCT 


AGCGCCCGCT 


CCTTTCGCTT 


TCTTCCCTTC 


120 


CTTTCTCGCC 


ACGTTCGCCG 


GCTTTCCCCG 


TCAAGCTCTA 


AATCGGGGGC 


TCCCTTTAGG 


180 


GTTCCGATTT 


AGTGCTTTAC 


GGCACCTCGA 


CCCCAAAAAA 


CTTGATTAGG 


GTGATGGTTC 


240 


ACGTAGTGGG 


CCATCGCCCT 


GATAGACGGT 


TTTTCGCCCT 


TTGACGTTGG 


AGTCCACGTT 


300 


CTTTAATAGT 


GGACTCTTGT 


TCCAAACTGG 


AACAACACTC 


AACCCTATCT 


CGGTCTATTC 


360 


TTTTGATTTA 


TAAGGGATTT 


TGCCGATTTC 


GGCCTATTGG 


TTAAAAAATG 


AGCTGATTTA 


420 


ACAAAAATTT 


AACGCGAATT 


TTAACAAAAT 


ATTAACGTTT 


ACAATTTCAG 


GTGGCACTTT 


480 


TCGGGGAAAT 


GTGCGCGGAA 


CCCCTATTTG 


TTTATTTTTC 


T AAAT AC AT T 


CAAATATGTA 


540 


TCCGCTCATG 


AATTAATTCT 


TAGAAAAACT 


CATCGAGCAT 


CAAATGAAAC 


TGCAATTTAT 


600 


TCATATCAGG 


ATTATCAATA 


CCATATTTTT 


GAAAAAGCCG 


TTTCTGTAAT 


GAAGGAGAAA 


660 


ACTCACCGAG 


GCAGTTCCAT 


AGGATGGCAA 


GATCCTGGTA 


TCGGTCTGCG 


ATTCCGACTC 


720 


GTCCAACATC 


AATACAACCT 


ATTAATTTCC 


CCTCGTCAAA 


AATAAGGTTA 


TCAAGTGAGA 


780 


AATCACCATG 


AGTGACGACT 


GAATCCGGTG 


AGAATGGCAA 


AAGTTTATGC 


ATTTCTTTCC 


840 


AGACTTGTTC 


AACAGGCCAG 


CCATTACGCT 


CGTCATCAAA 


ATCACTCGCA 


TCAACCAAAC 


900 


CGTTATTCAT 


TCGTGATTGC 


GCCTGAGCGA 


GACGAAATAC 


GCGATCGCTG 


TTAAAAGGAC 


960 


AATTACAAAC 


AGGAATCGAA 


TGCAACCGGC 


GCAGGAACAC 


TGCCAGCGCA 


TCAACAATAT 


1020 


TTTCACCTGA 


ATCAGGATAT 


TCTTCTAATA 


CCTGGAATGC 


TGTTTTCCCG 


GGGATCGCAG 


1080 


TGGTGAGTAA 


CCATGCATCA 


TCAGGAGTAC 


GGATAAAATG 


CTTGATGGTC 


GGAAGAGGCA 


1140 
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TAAATTCCGT 


CAGCCAGTTT 


AGTCTGACCA 


TCTCATCTGT 


AACATCATTG 


GCAACGCTAC 


1200 


CTTTGCCATG 


TTTCAGAAAC 


AACTCTGGCG 


CATCGGGCTT 


CCCATACAAT 


CGATAGATTG 


1260 


TCGCACCTGA 


TTGCCCGACA 


TTATCGCGAG 


CCCATTTATA 


C CCAT AT AAA 


TCAGCATCCA 


1320 


TGTTGGAATT 


TAATCGCGGC 


CTAGAGCAAG 


ACGTTTCCCG 


TTGAATATGG 


CTCATAACAC 


1380 


CCCTTGTATT 


ACTGTTTATG 


T AAG C AG AC A 


GTTTTATTGT 


TCATGACCAA 


AATCCCTTAA 


1440 


CGTGAGTTTT 


CGTTCCACTG 


AGCGTCAGAC 


CCCGTAGAAA 


AGATCAAAGG 


ATCTTCTTGA 


1500 


GATCCTTTTT 


TTCTGCGCGT 


AATCTGCTGC 


TTGCAAACAA 


AAAAAC C AC C 


GCTACCAGCG 


1560 


GTGGTTTGTT 


TGCCGGATCA 


AGAGCTACCA 


ACTCTTTTTC 


CGAAGGTAAC 


TGGCTTCAGC 


1620 


AGAGCGCAGA 


TACCAAATAC 


TGTCCTTCTA 


GTGTAGCCGT 


AGTTAGGCCA 


CCACTTCAAG 


1680 


AACTCTGTAG 


CACCGCCTAC 


ATACCTCGCT 


CTGCTAATCC 


TGTTACCAGT 


GGCTGCTGCC 


1740 


AGTGGCGATA 


AGTCGTGTCT 


TACCGGGTTG 


G AC T C AAG AC 


GATAGTTACC 


GGATAAGGCG 


1800 


CAGCGGTCGG 


GCTGAACGGG 


GGGTTCGTGC 


ACACAGCCCA 


GCTTGGAGCG 


AACGACCTAC 


1860 


ACCGAACTGA 


GATACCTACA 


GCGTGAGCTA 


TGAGAAAGCG 


CCACGCTTCC 


CGAAGGGAGA 


1920 


AAGGCGGACA 


GGTATCCGGT 


AAGCGGCAGG 


GTCGGAACAG 


GAGAGCGCAC 


GAGGGAGCTT 


1980 


CCAGGGGGAA 


ACGCCTGGTA 


TCTTTATAGT 


CCTGTCGGGT 


TTCGCCACCT 


CTGACTTGAG 


2040 


CGTCGATTTT 


TGTGATGCTC 


GTCAGGGGGG 


CGGAGCCTAT 


GGAAAAACGC 


CAGCAACGCG 


2100 


GCCTTTTTAC 


GGTTCCTGGC 


CTTTTGCTGG 


CCTTTTGCTC 


ACATGTTCTT 


TCCTGCGTTA 


2160 


TCCCCTGATT 


CTGTGGATAA 


CCGTATTACC 


GCCTTTGAGT 


GAG CT GAT AC 


CGCTCGCCGC 


2220 


AGCCGAACGA 


CCGAGCGCAG 


CGAGTCAGTG 


AGCGAGGAAG 


CGGAAGAGCG 


CCTGATGCGG 


2280 


TATTTTCTCC 


TTACGCATCT 


GTGCGGTATT 


TCACACCGCA 


TATATGGTGC 


ACTCTCAGTA 


2340 


CAATCTGCTC 


TGATGCCGCA 


TAGTTAAGCC 


AG TAT AC ACT 


CCGCTATCGC 


TACGTGACTG 


2400 


GGTCATGGCT 


GCGCCCCGAC 


ACCCGCCAAC 


ACCCGCTGAC 


GCGCCCTGAC 


GGGCTTGTCT 


2460 


GCTCCCGGCA 


TCCGCTTACA 


GACAAGCTGT 


GACCGTCTCC 


GGGAGCTGCA 


TGTGTCAGAG 


2520 


GTTTTCACCG 


TCATCACCGA 


AACGCGCGAG 


GCAGCTGCGG 


TAAAGCTCAT 


CAGCGTGGTC 


2580 


GTGAAGCGAT 


TCACAGATGT 


CTGCCTGTTC 


ATCCGCGTCC 


AGCTCGTTGA 


GTTTCTCCAG 


2640 


AAGCGTTAAT 


GTCTGGCTTC 


TGATAAAGCG 


GGCCATGTTA 


AGGGCGGTTT 


TTTCCTGTTT 


2700 


GGTCACTGAT 


GCCTCCGTGT 


AAGGGGGATT 


TCTGTTCATG 


GGGGTAATGA 


TACCGATGAA 


2760 


AC GAG AG AG G 


ATGCTCACGA 


TACGGGTTAC 


TGATGATGAA 


CATGCCCGGT 


TACTGGAACG 


2820 


TTGTGAGGGT 


AAACAACTGG 


CGGTATGGAT 


GCGGCGGGAC 


CAGAGAAAAA 


TCACTCAGGG 


2880 


TCAATGCCAG 


CGCTTCGTTA 


AT AC AG AT GT 


AGGTGTTCCA 


CAGGGTAGCC 


AGCAGCATCC 


2940 


TGCGATGCAG 


ATCCGGAACA 


TAATGGTGCA 


GGGCGCTGAC 


TTCCGCGTTT 


CCAGACTTTA 


3000 
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CGAAACACGG AAACCGAAGA CCATTCATGT TGTTGCTCAG GTCGCAGACG TTTTGCAGCA 3060 

GCAGTCGCTT CACGTTCGCT CGCGTATCGG TGATTCATTC TGCTAACCAG TAAGGCAACC 3120 

CCGCCAGCCT AGCCGGGTCC T C AACG AC AG GAGCACGATC ATGCGCACCC GTGGGGCCGC 3180 

CATGCCGGCG ATAATGGCCT GCTTCTCGCC GAAACGTTTG GTGGCGGGAC CAGTGACGAA 32 4 0 

GGCTTGAGCG AGGGCGTGCA AGATTCCGAA TACCGCAAGC GACAGGCCGA TCATCGTCGC 3300 

GCTCCAGCGA AAGCGGTCCT CGCCGAAAAT GACCCAGAGC GCTGCCGGCA CCTGTCCTAC 3360 

GAGTTGCATG ATAAAGAAGA CAGTCATAAG TGCGGCGACG ATAGTCATGC CCCGCGCCCA 34 20 

CCGGAAGGAG CTGACTGGGT TGAAGGCTCT CAAGGGCATC GGTCGAGATC CCGGTGCCTA 34 80 

AT GAG T GAG C TAACTTACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCGGGAAA 354 0 

CCTGTCGTGC CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG GTTTGCGTAT 3 600 

TGGGCGCCAG GGTGGTTTTT CTTTTCACCA GTGAGACGGG CAACAGCTGA TTGCCCTTCA 3660 

CCGCCTGGCC CTGAGAGAGT TGCAGCAAGC GGTCCACGCT GGTTTGCCCC AGCAGGCGAA 3720 

AATCCTGTTT GATGGTGGTT AACGGCGGGA T AT AAC AT G A GCTGTCTTCG GTATCGTCGT 37 80 

ATCCCACTAC CGAGATATCC GCACCAACGC GCAGCCCGGA CTCGGTAATG GCGCGCATTG 38 4 0 

CGCCCAGCGC CATCTGATCG TTGGCAACCA GCATCGCAGT GGGAACGATG CCCTCATTCA 3900 

GCATTTGCAT GGTTTGTTGA AAACCGGACA TGGCACTCCA GTCGCCTTCC CGTTCCGCTA 3960 

TCGGCTGAAT TTGATTGCGA GTGAGATATT TATGCCAGCC AGCCAGACGC AGACGCGCCG 4 020 

AGACAGAACT TAATGGGCCC GCTAACAGCG CGATTTGCTG GTGACCCAAT GCGACCAGAT 4 08 0 

GCTCCACGCC CAGTCGCGTA CCGTCTTCAT GGGAGAAAAT AATACTGTTG ATGGGTGTCT 414 0 

GG T C AG AG AC ATCAAGAAAT AACGCCGGAA CATTAGTGCA GGCAGCTTCC ACAGCAATGG 4200 

CATCCTGGTC ATCCAGCGGA TAGTTAATGA TCAGCCCACT GACGCGTTGC GCGAGAAGAT 42 60 

TGTGCACCGC CGCTTTACAG GCTTCGACGC CGCTTCGTTC TACCATCGAC ACCACCACGC 4320 

TGGCACCCAG TTGATCGGCG CGAGATTTAA TCGCCGCGAC AATTTGCGAC GGCGCGTGCA 4 380 

GGGCCAGACT GGAGGTGGCA ACGCCAATCA GCAACGACTG TTTGCCCGCC AGTTGTTGTG 44 40 

CCACGCGGTT GGGAATGTAA TTCAGCTCCG CCATCGCCGC TTCCACTTTT TCCCGCGTTT 4 500 

TCGCAGAAAC GTGGCTGGCC TGGTTCACCA CGCGGGAAAC GGTCTGATAA GAGACACCGG 4 5 60 

CATACTCTGC GACATCGTAT AACGTTACTG GTTTCACATT CACCACCCTG AATTGACTCT 4 620 

CTTCCGGGCG CTATCATGCC ATACCGCGAA AGGTTTTGCG CCATTCGATG GTGTCCGGGA 4 680 

TCTCGACGCT CTCCCTTATG CGACTCCTGC ATTAGGAAGC AGCCCAGTAG TAGGTTGAGG 47 40 

CCGTTGAGCA CCGCCGCCGC AAGGAATGGT GCATGCAAGG AGATGGCGCC CAACAGTCCC 4 800 

CCGGCCACGG GGCCTGCCAC CATACCCACG CCGAAACAAG CGCTCATGAG CCCGAAGTGG 4 8 60 
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CGAGCCCGAT 


CTTCCCCATC 


GGTGATGTCG 


GCGATATAGG 


CGCCAGCAAC 


CGCACCTGTG 


4920 


GCGCCGGTGA 


TGCCGGCCAC 


GATGCGTCCG 


GCGTAGAGGA 


TCGAGATCTC 


GATCCCGCGA 


4980 


AATTAATACG 


ACTCACTATA 


GGGGAATTGT 


GAGCGGATAA 


CAATTCCCCT 


CTAGAAATAA 


5040 


TTTTGTTTAA 


CTTTAAGAAG 


GAGATATACA 


TATGGGCCAT 


CAT CAT CATC 


ATCACGTGAT 


5100 


CG AC AT CATC 


GGGACCAGCC 


CCACATCCTG 


GGAACAGGCG 


GCGGCGGAGG 


CGGTCCAGCG 


5160 


GGCGCGGGAT 


AGCGTCGATG 


ACATCCGCGT 


CGCTCGGGTC 


ATTGAGCAGG 


ACATGGCCGT 


5220 


GGACAGCGCC 


GGCAAGATCA 


CCTACCGCAT 


CAAGCTCGAA 


GTGTCGTTCA 


AGATGAGGCC 


5280 


GGCGCAACCG 


AGGGGCTCGA 


AACCACCGAG 


CGGTTCGCCT 


GAAACGGGCG 


CCGGCGCCGG 


5340 


TACTGTCGCG 


ACTACCCCCG 


CGTCGTCGCC 


GGTGACGTTG 


GCGGAGACCG 


GTAGCACGCT 


5400 


GCTCTACCCG 


CTGTTCAACC 


TGTGGGGTCC 


GGCCTTTCAC 


GAGAGGTATC 


CGAACGTCAC 


5460 


GATCACCGCT 


CAGGGCACCG 


GTTCTGGTGC 


CGGGATCGCG 


CAGGCCGCCG 


CCGGGACGGT 


5520 


CAACATTGGG 


GCCTCCGACG 


CCTATCTGTC 


GGAAGGTGAT 


ATGGCCGCGC 


ACAAGGGGCT 


5580 


GATGAACATC 


GCGCTAGCCA 


TCTCCGCTCA 


GCAGGTCAAC 


TACAACCTGC 


CCGGAGTGAG 


5640 


CGAGCACCTC 


AAGCTGAACG 


GAAAAGTCCT 


GGCGGCCATG 


TACCAGGGCA 


CC AT CAAAAC 


5700 


CTGGGACGAC 


CCGCAGATCG 


CTGCGCTCAA 


CCCCGGCGTG 


AACCTGCCCG 


GCACCGCGGT 


5760 


AGTTCCGCTG 


CACCGCTCCG 


ACGGGTCCGG 


TGACACCTTC 


TTGTTCACCC 


AGTACCTGTC 


5820 


CAAGCAAGAT 


CCCGAGGGCT 


GGGGCAAGTC 


GCCCGGCTTC 


GGCACCACCG 


TCGACTTCCC 


5880 


GGCGGTGCCG 


GGTGCGCTGG 


GTGAGAACGG 


CAACGGCGGC 


ATGGTGACCG 


GTTGCGCCGA 


5940 


GACACCGGGC 


TGCGTGGCCT 


ATATCGGCAT 


CAGCTTCCTC 


GACCAGGCCA 


GTCAACGGGG 


6000 


ACTCGGCGAG 


GCCCAACTAG 


GCAATAGCTC 


TGGCAATTTC 


TTGTTGCCCG 


ACGCGCAAAG 


6060 


CATTCAGGCC 


GCGGCGGCTG 


GCTTCGCATC 


GAAAACCCCG 


GCGAACCAGG 


CGATTTCGAT 


6120 


GATCGACGGG 


CCCGCCCCGG 


ACGGCTACCC 


GATCATCAAC 


TACGAGTACG 


CCATCGTCAA 


6180 


CAACCGGCAA 


AAGGACGCCG 


CCACCGCGCA 


GACCTTGCAG 


GCATTTCTGC 


ACTGGGCGAT 


6240 


CACCGACGGC 


AACAAGGCCT 


CGTTCCTCGA 


CCAGGTTCAT 


TTCCAGCCGC 


TGCCGCCCGC 


6300 


GGTGGTGAAG 


TTGTCTGACG 


CGTTGATCGC 


GACGATTTCC 






\J J O \J 


TGCCGCTACC 


CTCGCGCAGG 


AGGCAGGTAA 


TTTCGAGCGG 


ATCTCCGGCG 


ACCTGAAAAC 


6420 


CCAGATCGAC 


CAGGTGGAGT 


CGACGGCAGG 


TTCGTTGCAG 


GGCCAGTGGC 


GCGGCGCGGC 


6480 


GGGGACGGCC 


GCCCAGGCCG 


CGGTGGTGCG 


CTTCCAAGAA 


GCAGCCAATA 


AGCAGAAGCA 


6540 


GGAACTCGAC 


GAGATCTCGA 


CGAATATTCG 


TCAGGCCGGC 


GTCCAATACT 


CGAGGGCCGA 


6600 


CGAGGAGCAG 


CAGCAGGCGC 


TGTCCTCGCA 


AATGGGCTTT 


GTGCCCACAA 


CGGCCGCCTC 


6660 


GCCGCCGTCG 


ACCGCTGCAG 


CGCCACCCGC 


ACCGGCGACA 


CCTGTTGCCC 


CCCCACCACC 


6720 
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GGCCGCCGCC 


AACACGCCGA 


ATGCCCAGCC 


GGGCGATCCC 


AACGCAGCAC 


CTCCGCCGGC 


6780 


CGACCCGAAC 


GCACCGCCGC 


CACCTGTCAT 


TGCCCCAAAC 


GCACCCCAAC 


CTGTCCGGAT 


6840 


CGACAACCCG 


GTTGGAGGAT 


TCAGCTTCGC 


GCTGCCTGCT 


GGCTGGGTGG 


AGTCTGACGC 


6900 


CGCCCACTTC 


GACTACGGTT 


CAGCACTCCT 


CAGCAAAACC 


ACCGGGGACC 


CGCCATTTCC 


6960 


CGGACAGCCG 


CCGCCGGTGG 


CCAATGACAC 


CCGTATCGTG 


CTCGGCCGGC 


TAGACCAAAA 


7020 


GCTTTACGCC 


AGCGCCGAAG 


CCACCGACTC 


CAAGGCCGCG 


GCCCGGTTGG 


GCTCGGACAT 


7080 


GGGTGAGTTC 


TATATGCCCT 


ACCCGGGCAC 


CCGGATCAAC 


CAGGAAACCG 


TCTCGCTTGA 


7140 


CGCCAACGGG 


GTGTCTGGAA 


GCGCGTCGTA 


TTACGAAGTC 


AAGTTCAGCG 


ATCCGAGTAA 


7200 


GCCGAACGGC 


CAGATCTGGA 


CGGGCGTAAT 


CGGCTCGCCC 


GCGGCGAACG 


CACCGGACGC 


7260 


CGGGCCCCCT 


CAGCGCTGGT 


TTGTGGTATG 


GCTCGGGACC 


GCCAACAAPr 


wool ovj-rt^rxri 




GGGCGCGGCC 


AAGGCGCTGG 


CCGAATCGAT 


CCGGCCTTTG 


GTCGCCCCGC 


CGCCGGCGCC 


7380 


GGCACCGGCT 


CCTGCAGAGC 


CCGCTCCGGC 


GCCGGCGCCG 


GCCGGGGAAG 


TCGCTCCTAC 


7440 


CCCGACGACA 


CCGACACCGC 


AGCGGACCTT 


ACCGGCCTGA 


GAATTCTGCA 


GAT AT CCATC 


7500 


ACACTGGCGG 


CCGCTCGAGC 


ACCACCACCA 


CCACCACTGA 


GATCCGGCTG 


CTAACAAAGC 


7560 


CCGAAAGGAA 


GCTGAGTTGG 


CTGCTGCCAC 


CGCTGAGCAA 


TAACTAGCAT 


AACCCCTTGG 


7620 


GGCCTCTAAA 


CGGGTCTTGA 


GGGGTTTTTT 


GCTGAAAGGA 


GGAACTATAT 


CCGGAT 


7676 



(2) INFORMATION FOR SEQ ID NO: 214: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 802 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 214: 

Met Gly His His His His His His Val He Asp lie lie Gly Thr Ser 
1 5 10 15 

Pro Thr Ser Trp Glu Gin Ala Ala Ala Glu Ala Val Gin Arg Ala Arg 
20 25 30 

Asp Ser Val Asp Asp lie Arg Val Ala Arg Val lie Glu Gin Asp Met 
35 40 45 

Ala Val Asp Ser Ala Gly Lys lie Thr Tyr Arg lie Lys Leu Glu Val 
50 55 60 

Ser Phe Lys Met Arg Pro Ala Gin Pro Arg Gly Ser Lys Pro Pro Ser 
65 70 75 80 
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Gly Ser Pro Glu Thr Gly Ala Gly Ala Gly Thr Val Ala Thr Thr Pro 
85 90 95 

Ala Ser Ser Pro Val Thr Leu Ala Glu Thr Gly Ser Thr Leu Leu Tyr 
100 105 no 

Pro Leu Phe Asn Leu Trp Gly Pro Ala Phe His Glu Arg Tyr Pro Asn 
115 120 125 

Val Thr He Thr Ala Gin Gly Thr Gly Ser Gly Ala Gly He Ala Gin 
130 135 140 

Ala Ala Ala Gly Thr Val Asn He Gly Ala Ser Asp Ala Tyr Leu Ser 
145 150 155 160 

Glu Gly Asp Met Ala Ala His Lys Gly Leu Met Asn He Ala Leu Ala 
165 170 175 

He Ser Ala Gin Gin Val Asn Tyr Asn Leu Pro Gly Val Ser Glu His 
180 185 190 

Leu Lys Leu Asn Gly Lys Val Leu Ala Ala Met Tyr Gin Gly Thr He 
195 200 205 

Lys Thr Trp Asp Asp Pro Gin lie Ala Ala Leu Asn Pro Gly Val Asn 
210 215 220 

Leu Pro Gly Thr Ala Val Val Pro Leu His Arg Ser Asp Gly Ser Gly 
225 230 235 240 

Asp Thr Phe Leu Phe Thr Gin Tyr Leu Ser Lys Gin Asp Pro Glu Gly 
245 250 255 

Trp Gly Lys Ser Pro Gly Phe Gly Thr Thr Val Asp Phe Pro Ala Val 
260 265 270 

Pro Gly Ala Leu Gly Glu Asn Gly Asn Gly Gly Met Val Thr Gly Cys 
275 280 285 

Ala Glu Thr Pro Gly Cys Val Ala Tyr He Gly He Ser Phe Leu Asp 
290 295 300 

Gin Ala Ser Gin Arg Gly Leu Gly Glu Ala Gin Leu Gly Asn Ser Ser 
305 310 315 320 

Gly Asn Phe Leu Leu Pro Asp Ala Gin Ser He Gin Ala Ala Ala Ala 
325 330 335 

Gly Phe Ala Ser Lys Thr Pro Ala Asn Gin Ala He Ser Met He Asp 
340 345 350 

Gly Pro Ala Pro Asp Gly Tyr Pro lie lie Asn Tyr Glu Tyr Ala lie 
355 360 365 

Val Asn Asn Arg Gin Lys Asp Ala Ala Thr Ala Gin Thr Leu Gin Ala 
370 375 380 

Phe Leu His Trp Ala He Thr Asp Gly Asn Lys Ala Ser Phe Leu Asp 
385 390 395 400 

Gin Val His Phe Gin Pro Leu Pro Pro Ala Val Val Lys Leu Ser Asp 
405 410 415 
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Ala Leu lie Ala Thr lie Ser Ser Ala Glu Met Lys Thr Asp Ala Ala 
420 425 430 

Thr Leu Ala Gin Glu Ala Gly Asn Phe Glu Arg lie Ser Gly Asp Leu 
435 440 445 

Lys Thr Gin lie Asp Gin Val Glu Ser Thr Ala Gly Ser Leu Gin Glv 
450 455 460 

Gin Trp Arg Gly Ala Ala Gly Thr Ala Ala Gin Ala Ala Val Val Arg 
465 470 475 480 

Phe Gin Glu Ala Ala Asn Lys Gin Lys Gin Glu Leu Asp Glu lie Ser 
485 490 495 

Thr Asn lie Arg Gin Ala Gly Val Gin Tyr Ser Arg Ala Asp Glu Glu 
500 505 510 

Gin Gin Gin Ala Leu Ser Ser Gin Met Gly Phe Val Pro Thr Thr Ala 
515 520 525 

Ala Ser Pro Pro Ser Thr Ala Ala Ala Pro Pro Ala Pro Ala Thr Pro 
530 535 540 

Val Ala Pro Pro Pro Pro Ala Ala Ala Asn Thr Pro Asn Ala Gin Pro 
545 550 555 560 

Gly Asp Pro Asn Ala Ala Pro Pro Pro Ala Asp Pro Asn Ala Pro Pro 
565 570 575 

Pro Pro Val lie Ala Pro Asn Ala Pro Gin Pro Val Arg lie Asp Asn 
580 585 590 

Pro Val Gly Gly Phe Ser Phe Ala Leu Pro Ala Gly Trp Val Glu Ser 
595 600 605 

Asp Ala Ala His Phe Asp Tyr Gly Ser Ala Leu Leu Ser Lys Thr Thr 
610 615 620 

Gly Asp Pro Pro Phe Pro Gly Gin Pro Pro Pro Val Ala Asn Asp Thr 
625 630 635 640 

Arg lie Val Leu Gly Arg Leu Asp Gin Lys Leu Tyr Ala Ser Ala Glu 
645 650 655 

Ala Thr Asp Ser Lys Ala Ala Ala Arg Leu Gly Ser Asp Met Gly Glu 
660 665 670 

Phe Tyr Met Pro Tyr Pro Gly Thr Arg lie Asn Gin Glu Thr Val Ser 
675 680 685 

Leu Asp Ala Asn Gly Val Ser Gly Ser Ala Ser Tyr Tyr Glu Val Lys 
690 695 700 

Phe Ser Asp Pro Ser Lys Pro Asn Gly Gin lie Trp Thr Gly Val lie 
705 710 715 720 

Gly Ser Pro Ala Ala Asn Ala Pro Asp Ala Gly Pro Pro Gin Arg Trp 
725 730 735 

Phe Val Val Trp Leu Gly Thr Ala Asn Asn Pro Val Asp Lys Gly Ala 
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740 

Ala Lys Ala Leu Ala 
755 

Ala Pro Ala Pro Ala 
770 

Gly Glu Val Ala Pro 
785 

Pro Ala 



745 

Glu Ser lie Arg Pro 
760 

Pro Ala Glu Pro Ala 
775 



Thr Pro Thr Thr Pro 
790 



750 

Leu Val Ala Pro Pro Pro 
765 

Pro Ala Pro Ala Pro Ala 
780 

Thr Pro Gin Arg Thr Leu 
795 800 
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CLAIMS 

1. A polypeptide comprising an immunogenic portion of a soluble 
M. tuberculosis antigen, or a variant of said antigen that differs only in conservative 
substitutions and/or modifications, wherein said antigen has an N-terrninal sequence selected 
from the group consisting of: 

(a) Asp-Pro-Val-Asp-Ala-Val-Ile-Asn-Thr-Thr-Cys-Asn-Tyr-Gly-Gln- 
Val-Val-Ala-Ala-Leu; (SEQ ID No. 120) 

(b) Ala-Val-Glu-Ser-Gly-Met-Leu-AIa-Leu-Gly-Thr-Pro-Ala-Pro-Ser; 
(SEQ ID No. 121) 

(c) Ala-Ala-Met-Lys-Pro-Arg-Thr-Gly-Asp-Gly-Pro-Leu-Glu-Ala-Ala- 
Lys-Glu-Gly-Arg; (SEQ ID No. 122) 

(d) Tyr-Tyr-Trp-Cys-Pro-Gly-Gln-Pro-Phe-Asp-Pro-Ala-Trp-Gly-Pro; 
(SEQ ID No. 123) 

(e) Asp-Ile-Gly-Ser-Glu-Ser-Thr-Glu-Asp-Gln-Gln-Xaa-Ala-Val; (SEQ 
ID No. 124) 

(f) Ala-Glu-Glu-Ser-Ile-Ser-Thr-Xaa-Glu-Xaa-Ile-Val-Pro; (SEQ ID No. 
125) 

(g) Asp-Pro-Glu-Pro-Ala-Pro-Pro-Val-Pro-Thr-Thr-Ala-Ala-Ser-Pro-Pro- 
Ser; (SEQ ID No. 126) 

(h) Ala-Pro-Lys-Thr-Tyr-Xaa-Glu-Glu-Leu-Lys-GIy-Thr-Asp-Thr-Gly; 
(SEQ ID No. 127) 

(i) Asp-Pro-Ala-Ser-Ala-Pro-Asp-Val-Pro-Thr-Ala-Ala-Gln-Leu-Thr-Ser- 
Leu-Leu-Asn-Ser-Leu-Ala- Asp-Pro- Asn-Val-Ser-Phe- Ala- Asn; (SEQ 
ID No. 128) and 

0) Ala-Pro-Glu-Ser-Gly-Ala-Gly-Leu-Gly-Gly-Thr-Val-Gln-Ala-Gly; 
(SEQ ID No. 136) 
wherein Xaa may be any amino acid. 

2. A polypeptide comprising an immunogenic portion of an 
M. tuberculosis antigen, or a variant of said antigen that differs only in conservative 
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substitutions and/or modifications, wherein said antigen has an N-terminal sequence selected 
from the group consisting of: 

(a) Asp-Pro-Pro-Asp-Pro-His-GIn-Xaa-Asp-Met-Thr-Lys-Gly-Tyr-Tyr- 
Pro-GIy-Gly-Arg-Arg-Xaa-Phe; (SEQ ID No. 129) and 

(b) Xaa-Tyr-Ile-Ala-Tyr-Xaa-Thr-Thr-Ala-Gly-Ile-Val-Pro-GIy-Lys-Ile- 
Asn-Val-His-Leu-Val; (SEQ ID No. 137), wherein Xaa may be any 
amino acid. 

3. A polypeptide comprising an immunogenic portion of a soluble 
M. tuberculosis antigen, or a variant of said antigen that differs only in conservative 
substitutions and/or modifications, wherein said antigen comprises an amino acid sequence 
encoded by a DNA sequence selected from the group consisting of the sequences recited in 
SEQ ID Nos.: 1, 2, 4-10, 13-25, 52, 99 and 101, the complements of said sequences, and 
DNA sequences that hybridize to a sequence recited in SEQ ID Nos.: 1, 2, 4-10, 13-25, 52, 
99 and 101 or a complement thereof under moderately stringent conditions. 

4. A polypeptide comprising an immunogenic portion of a 
M. tuberculosis antigen, or a variant of said antigen that differs only in conservative 
substitutions and/or modifications, wherein said antigen comprises an amino acid sequence 
encoded by a DNA sequence selected from the group consisting of the sequences recited in 
SEQ ID Nos.: 26-51, 138, 139, 163-183 and 201, the complements of said sequences, and 
DNA sequences that hybridize to a sequence recited in SEQ ID Nos.: 26-51, 138, 139, 163- 
1 83 and 201 or a complement thereof under moderately stringent conditions. 

5. A DNA molecule comprising a nucleotide sequence encoding a 
polypeptide according to any one of claims 1-4. 

6. An expression vector comprising a DNA molecule according to 

claim 5. 



7. 



A host cell transformed with an expression vector according to claim 6. 
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8. The host cell of claim 7 wherein the host cell is selected from the group 
consisting of E. coli, yeast and mammalian cells. 

9. A pharmaceutical composition comprising one or more polypeptides 
according to any one of claims 1 -4 and a physiologically acceptable carrier. 

10. A pharmaceutical composition comprising one or more DNA 
molecules according to claim 5 and a physiologically acceptable carrier. 

11. A pharmaceutical composition comprising one or more DNA 
sequences recited in SEQ ID Nos.: 3, 1 1, 12, 140 and 141; and a physiologically acceptable 
carrier. 

12. A vaccine comprising one or more polypeptides according to any one 
of claims 1-4 and a non-specific immune response enhancer. 

13. A vaccine comprising: 

a polypeptide having an N-terminal sequence selected from the group 
consisting of sequences recited in SEQ ID NO: 134 and 135; and 
a non-specific immune response enhancer. 

14. A vaccine comprising: 

one or more polypeptides encoded by a DNA sequence selected from the 
group consisting of SEQ ID Nos.: 3, 11, 12, 140 and 141, the complements of said 
sequences, and DNA sequences that hybridize to a sequence recited in SEQ ID Nos.: 3, 1 1, 
12, 140 and 141; and 

a non-specific immune response enhancer. 

15. The vaccine of claims 12-14 wherein the non-specific immune 
response enhancer is an adjuvant. 
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16. A vaccine comprising one or more DNA molecules according to claim 
5 and a non-specific immune response enhancer. 

17. A vaccine comprising one or more DNA sequences recited in SEQ ID 
Nos.: 3, 1 1, 12, 140 and 141; and a non-specific immune response enhancer. 

18. The vaccine of claims 16 or 17 wherein the non-specific immune 
response enhancer is an adjuvant. 

19. A pharmaceutical composition according to any one of claims 9-11, for 
use in the manufacture of a medicament for inducing protective immunity in a patient. 

20. A vaccine according to any one of claims 12-18, for use in the 
manufacture of a medicament for inducing protective immunity in a patient. 

21. A fusion protein comprising two or more polypeptides according to 
any one of claims 1-4. 

22. A fusion protein comprising one or more polypeptides according to 
any one of claims 1-4 and ESAT-6. 

23. A fusion protein comprising one or more polypeptides according to 
any one of claims 1-4 and the M tuberculosis antigen 38 kD (SEQ ID NO: 155). 

24. A pharmaceutical composition comprising a fusion protein according 
to any one of claims 21-23 and a physiologically acceptable carrier. 

25. A vaccine comprising a fusion protein according to any one of claims 
21-23 and a non-specific immune response enhancer. 



26. The vaccine of claim 25 wherein the non-specific immune response 
enhancer is an adjuvant. 
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27. A pharmaceutical composition according to claim 24, for use in the 
manufacture of a medicament for inducing protective immunity in a patient. 

28. A vaccine according to claims 25 or 26, for use in the manufcture of a 
medicament for inducing protective immunity in a patient. 

29. A method for detecting tuberculosis in a patient, comprising: 

(a) contacting dermal cells of a patient with one or more polypeptides 
according to any one of claims 1-4; and 

(b) detecting an immune response on the patient's skin and therefrom 
detecting tuberculosis in the patient. 

30. A method for detecting tuberculosis in a patient, comprising: 

(a) contacting dermal cells of a patient with a polypeptide having an N- 
terminal sequence selected from the group consisting of sequences recited in SEQ ID NO: 
134 and 135; and 

(b) detecting an immune response on the patient's skin and therefrom 
detecting tuberculosis in the patient. 

31. A method for detecting tuberculosis in a patient, comprising: 

(a) contacting dermal cells of a patient with one or more polypeptides 
encoded by a DNA sequence selected from the group consisting of SEQ ID Nos.: 3, 11, 12, 
140, 141, 156-160, 189-193, 199, 200 and 203, the complements of said sequences, and DNA 
sequences that hybridize to a sequence recited in SEQ ID Nos.: 3, 1 1, 12, 140, 141, 156-160, 
189-193, 199, 200 and 203; and 

(b) detecting an immune response on the patient's skin and therefrom 
detecting tuberculosis in the patient. 

32. The method of any one of claims 29-31 wherein the immune response 

is induration. 
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33. A diagnostic kit comprising: 

(a) a polypeptide according to any one of claims 1 -4; and 

(b) apparatus sufficient to contact said polypeptide with the dermal cells of 

a patient. 



34. A diagnostic kit comprising: 

(a) a polypeptide having an N-terminal sequence selected from the group 
consisting of sequences recited in SEQ ID NO: 134 and 135; and 

(b) apparatus sufficient to contact said polypeptide with the dermal cells of 

a patient. 

35. A diagnostic kit comprising: 

(a) a polypeptide encoded by a DNA sequence selected from the group 
consisting of SEQ ID Nos.: 3, 11, 12, 140, 141, 156-160, 189-193, 199, 200 and 203, the 
complements of said sequences, and DNA sequences that hybridize to a sequence recited in 
SEQ ID Nos.: 3, 11, 12, 140, 141, 156-160, 189-193, 199, 200 and 203; and 

(b) apparatus sufficient to contact said polypeptide with the dermal cells of 

a patient. 



36. A diagnostic kit comprising: 

(a) a fusion protein according to any one of claims 21-23; and 

(b) apparatus sufficient to contact said fusion protein with the dermal cells of a 
patient. 



37. A fusion protein according to claim 23 comprising an amino acid 
sequence selected from the group consisting of sequences recited in SEQ ID NO: 153 and 
209. 
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